much more on minimax (order bounds)
http://www-stat.stanford.edu/~imj/wald/wald1web.pdf
cf. lecture by Iain Johnstone
Monday, June 3, 2013
today’s lecture
• parametric estimation, Fisher information, Cramer-Rao lower bound: Ch. 4, Sec. 9.3
• information and estimation: Ch. 7
• universal denoising: Ch. 8
• (chapters and sections from new version of notes)
Monday, June 3, 2013
Monday, June 3, 2013
mean squared error estimation
Monday, June 3, 2013
bias-variance
Monday, June 3, 2013
Fisher Information
exercise:
Monday, June 3, 2013
exercise
Monday, June 3, 2013
note
Monday, June 3, 2013
Monday, June 3, 2013
• r.h.s. depends on estimator
• far from tight: consider estimator identically 0
note:
Monday, June 3, 2013
multi-parameter case
Monday, June 3, 2013
Fisher information for a “location family”
Monday, June 3, 2013
Fisher Information and MMSE
Monday, June 3, 2013
Monday, June 3, 2013
recall
5 Notation and Conventions
Our conventions and notation for information measures, such as mutual information and relative entropy, are stan-
dard. The initiated reader is advised to skip this section. If U, V,W are three random variables taking values in
Polish spaces U ,V,W, respectively, and defined on a common probability space with a probability measure P , we let
PU , PU,V etc. denote the probability measures induced on U , the pair (U ,V) etc. while e.g., PU |V denotes a regular
version of the conditional distribution of U given V . PU |v is the distribution on U obtained by evaluating that regular
version at v. If Q is another probability measure on the same measurable space we similarly denote QU , QU |V , etc.As usual, given two measures on the same measurable space, e.g., P and Q, define their relative entropy (divergence)
by
D(P�Q) =
� �log
dP
dQ
�dP (12)
when P is absolutely continuous w.r.t. Q, defining D(P�Q) = ∞ otherwise. An immediate consequence of the
definitions of relative entropy and of the Radon-Nikodym derivative is that if f : U → V is measurable and one-to-
one, and V = f(U), then
D(PU�QU ) = D(PV �QV ). (13)
Following [5], we further use the notation
D(PU |V �QU |V |PV ) =
�D(PU |v�QU |v)dPV (v), (14)
where on the right side D(PU |v�QU |v) is a divergence in the sense of (12) between the measures PU |v and QU |v. It
will be convenient to write
D(PU |V �QU |V ) (15)
to denote f(V ) when f(v) = D(PU |v�QU |v). Thus D(PU |V �QU |V ) is a random variable while D(PU |V �QU |V |PV ) is
its expectation under P . With this notation, the chain rule for relative entropy (cf., e.g., [6, Subsection D.3]) is
D(PU,V �QU,V ) = D(PU�QU ) +D(PV |U�QV |U |PU ) (16)
and is valid regardless of the finiteness of both sides of the equation.
The mutual information between U and V is defined as
I(U ;V ) = D(PU,V �PU × PV ), (17)
where PU × PV denotes the product measure induced by PU and PV . We note in passing, in line with the comment
on relative entropy and one-to-one transformations leading to (13), that if f and g are two measurable one-to-one
transformations and A = f(U) while B = g(V ), then
I(U ;V ) = I(A;B). (18)
Finally, the conditional mutual information between U and V , given W , is defined as
I(U ;V |W ) = D(PU,V |W �PU |W × PV |W |PW ). (19)
The roles of U, V,W will be played in what follows by scalar random variables, vectors, or processes.
6 Relative Entropy and Mismatched Estimation
6.1 For slides
• Scalar Channel:
X ≥ 0
Yγ |X ∼ Poisson(γ ·X)
7
5 Notation and Conventions
Our conventions and notation for information measures, such as mutual information and relative entropy, are stan-
dard. The initiated reader is advised to skip this section. If U, V,W are three random variables taking values in
Polish spaces U ,V,W, respectively, and defined on a common probability space with a probability measure P , we let
PU , PU,V etc. denote the probability measures induced on U , the pair (U ,V) etc. while e.g., PU |V denotes a regular
version of the conditional distribution of U given V . PU |v is the distribution on U obtained by evaluating that regular
version at v. If Q is another probability measure on the same measurable space we similarly denote QU , QU |V , etc.As usual, given two measures on the same measurable space, e.g., P and Q, define their relative entropy (divergence)
by
D(P�Q) =
� �log
dP
dQ
�dP (12)
when P is absolutely continuous w.r.t. Q, defining D(P�Q) = ∞ otherwise. An immediate consequence of the
definitions of relative entropy and of the Radon-Nikodym derivative is that if f : U → V is measurable and one-to-
one, and V = f(U), then
D(PU�QU ) = D(PV �QV ). (13)
Following [5], we further use the notation
D(PU |V �QU |V |PV ) =
�D(PU |v�QU |v)dPV (v), (14)
where on the right side D(PU |v�QU |v) is a divergence in the sense of (12) between the measures PU |v and QU |v. It
will be convenient to write
D(PU |V �QU |V ) (15)
to denote f(V ) when f(v) = D(PU |v�QU |v). Thus D(PU |V �QU |V ) is a random variable while D(PU |V �QU |V |PV ) is
its expectation under P . With this notation, the chain rule for relative entropy (cf., e.g., [6, Subsection D.3]) is
D(PU,V �QU,V ) = D(PU�QU ) +D(PV |U�QV |U |PU ) (16)
and is valid regardless of the finiteness of both sides of the equation.
The mutual information between U and V is defined as
I(U ;V ) = D(PU,V �PU × PV ), (17)
where PU × PV denotes the product measure induced by PU and PV . We note in passing, in line with the comment
on relative entropy and one-to-one transformations leading to (13), that if f and g are two measurable one-to-one
transformations and A = f(U) while B = g(V ), then
I(U ;V ) = I(A;B). (18)
Finally, the conditional mutual information between U and V , given W , is defined as
I(U ;V |W ) = D(PU,V |W �PU |W × PV |W |PW ). (19)
The roles of U, V,W will be played in what follows by scalar random variables, vectors, or processes.
6 Relative Entropy and Mismatched Estimation
6.1 For slides
• Scalar Channel:
X ≥ 0
Yγ |X ∼ Poisson(γ ·X)
7
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
mutual information and MMSE
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
(follows from J-MMSE and De-Bruijn)
Monday, June 3, 2013
continuous time
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
2
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
, [Zakai 2005]:
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1
P (X = x)
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
cmseP,Q(snr) =1
snr
� snr
0mseP,Q(γ)dγ
=2
snr[I(snr) +D (PY �QY )]
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
2
Monday, June 3, 2013
Duncan1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channel
dYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT;Y T
) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT;Y T
)
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
I(XT;Y T
) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information between
the input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),
is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simple
relationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as the
continuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings where
this relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-
tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributed
continuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean value
of the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual information
to both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimator
that would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due to
the mismatch is equal to the relative entropy between the true channel output distribution and the channel output
distribution under Q, at SNR = γ.This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEs
continues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distribution
that differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown to
be the sum of the mutual information and the relative entropy between the true and mismatched output distributions,
this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, the
input, is a non-negative random variable while the conditional distribution of the output Y given the input is
given by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, the
channel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the output
Y T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the
“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:
2
Monday, June 3, 2013
SNR in Duncan
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown to
2
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
, [Zakai 2005]:
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1
P (X = x)
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
cmseP,Q(snr) =1
snr
� snr
0mseP,Q(γ)dγ
=2
snr[I(snr) +D (PY �QY )]
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
2
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
cmleP,Q(snr) =
Relationship between cmmse and mmse??⇒
�⇒
⇒?
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
What is Cost of Mismatch?
D(P�Q) =
� ∞
0[mseP,Q(γ)−mseP,P (γ)]dγ
D(PYsnr�QYsnr) =
� snr
0[mseP,Q(γ)−mseP,P (γ)]dγ
d
dγD(PY �QY ) = mseP,Q(γ)−mseP,P (γ)
X ∼ P
?
3
Recap
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
+
2Monday, June 3, 2013
Mismatch
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
X ∼ P
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
X ∼ P
2
Monday, June 3, 2013
A new representation of relative entropy [Verdu 2010]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
What is Cost of Mismatch?
D(P�Q) =
� ∞
0[mseP,Q(γ)−mseP,P (γ)]dγ
d
dγD(PY �QY ) = mseP,Q(γ)−mseP,P (γ)
X ∼ P
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input is
3
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse?
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
What is Cost of Mismatch?
D(P�Q) =
� ∞
0[mseP,Q(γ)−mseP,P (γ)]dγ
D(PYsnr�QYsnr) =
� snr
0[mseP,Q(γ)−mseP,P (γ)]dγ
d
dγD(PY �QY ) = mseP,Q(γ)−mseP,P (γ)
X ∼ P
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
3
Monday, June 3, 2013
Causal vs. Non-causal Mismatched Estimation 1 for Duncan slide
AWGN channel
dYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT;Y T
) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT;Y T
)
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
I(XT;Y T
) =1
2E
�� T
0(Xt − E[Xt|Y t
])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information between
the input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),
is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simple
relationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as the
continuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings where
this relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-
tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributed
continuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean value
of the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual information
to both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimator
that would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due to
the mismatch is equal to the relative entropy between the true channel output distribution and the channel output
distribution under Q, at SNR = γ.This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEs
continues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distribution
that differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown to
be the sum of the mutual information and the relative entropy between the true and mismatched output distributions,
this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, the
input, is a non-negative random variable while the conditional distribution of the output Y given the input is
given by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, the
channel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the output
Y T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the
“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
Relationship between cmseP,Q and mseP,Q ?
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =1
2cmmse(γ)
2 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus it
2
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
cmseP,Q(snr) =1
snr
� snr
0mseP,Q(γ)dγ
=1
snr[I +D]
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
2
Monday, June 3, 2013
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
2
1 for Duncan slide
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
cmseP,Q(snr) =1
snr
� snr
0mseP,Q(γ)dγ
=1
snr[I +D]
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:
d
dγI(γ) =
1
2mmse(γ)
2
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1
P (X = x)
AWGN channeldYt = Xtdt+ dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X[Duncan 1970]:
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√γXtdt+ dWt, 0 ≤ t ≤ T
I(γ) = I(XT ;Y T )
cmmse(γ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
For mismatch:
cmseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y t])2dt
�
mseP,Q(γ) = EP
�� T
0(Xt − EQ[Xt|Y T ])2dt
�
cmseP,Q(snr) =1
snr
� snr
0mseP,Q(γ)dγ
=2
snr[I(snr) +D (PY T �QY T )]
Relationship between cmseP,Q and mseP,Q ?Relationship between cmseP,Q and mseP,Q
I(XT ;Y T ) =1
2E
�� T
0(Xt − E[Xt|Y t])2dt
�
[Duncan 1970]:
I(γ) =γ
2· cmmse(γ)
2 for GSV slide
Y =√γ ·X +W
W is a standard Gaussian, independent of X
I(γ) = I(X;Y )
mmse(γ) = E�(X − E[X|Y ])2
�
[Guo, Shamai and Verdu 2005]:[Guo, Shamai and Verdu 2008]
2
Monday, June 3, 2013
minimax estimation
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
minimax estimation
classical
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
minimax estimation
classical
ours
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
minimax estimation
classical
ours
Redundancy-Capacity theory
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
minimax estimation
classical
ours
Redundancy-Capacity theory
Shannon
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
minimax estimation
classical
ours
Redundancy-Capacity theory
Shannon
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (29)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (30)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(31)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(32)
=2
snrC
��PY T
snr
�P∈P
�(33)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr) ≥ (1− ε) ·minimax(P, snr) (34)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2−ε·C(P,snr)
, (35)
w∗ being the capacity achieving prior
11
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
cmseP,Q − cmseP,P = D(PY T �QY T ) (29)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poisson as a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (30)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) arewell-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr)
�
minimax(P, snr)�= min
X(·)maxP∈P
�cmseP,X(snr)− cmseP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmseP,Q(snr)− cmseP,P (snr)] (31)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(32)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(33)
=2
snrC
��PY T
snr
�P∈P
�(34)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (35)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (36)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
11
Monday, June 3, 2013
Strong Converse
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poissonas a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (29)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) are
well-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmleP,Q(snr)− cmseP,P (snr)] (30)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(31)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(32)
=2
snrC(P, snr) (33)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr) ≥ (1− ε) ·minimax(P, snr) (34)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2−ε·C(P,snr), (35)
w∗ being the capacity achieving prior
10
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poissonas a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (29)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) are
well-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmleP,Q(snr)− cmseP,P (snr)] (30)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(31)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(32)
=2
snrC(P, snr) (33)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr) ≥ (1− ε) ·minimax(P, snr) (34)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2−ε·C(P,snr), (35)
w∗ being the capacity achieving prior
10
Theorem 6.4 Let P and Q be two probability measures that are members of P. For γ ≥ 0,
D(PY Tγ�QY T
γ) = γ · [cmleP,Q(γ)− cmleP,P (γ)] . (27)
Theorem 6.5 (under mild conditions)
D(PY T �QY T ) ∝ cmleP,Q − cmleP,P (28)
• Girsanov-type theory for expressing logdQY T
dlaw of homogenous Poissonas a filtering integral
• manipulating
D(PY T �QY T ) = EP
log
dPY T
dlaw of homogenous Poisson
logdQY T
dlaw of homogenous Poisson
via ‘orthogonality’ etc.
Put together, Theorem 6.3 and Theorem 6.5 yield, for γ > 0,
cmleP,Q(γ)− cmleP,P (γ) =1
γ
� γ
0[mleP,Q(α)−mleP,P (α)] dα =
1
γD(PY T
γ�QY T
γ), (29)
which is the Poissonian analogue of [33, Theorem 2]. On a technical note, the r.h.s. of (24), (25) and (26) are
well-defined as integrals of non-negative Borel measurable functions, as will follow from our treatment in Section 9.
6.4 for slides: minimaxity
minimax(P, snr)�= min
{Xt(·)}0≤t≤T
maxP∈P
�EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr)
�
minimax(P, snr) = minQ
maxP∈P
[cmleP,Q(snr)− cmseP,P (snr)] (30)
=2
snrminQ
maxP∈P
D�PY T
snr
��QY Tsnr
�(31)
=2
snrmax
�I�Θ;Y T
snr
�: Θ is a P-valued RV
�(32)
=2
snrC(P, snr) (33)
Furthermore, the ‘strong redundancy-capacity’ results are directly applicable here and imply:
6.5 strong red cap
∀ε > 0 and any filter {Xt(·)}0≤t≤T ,
EP
�� T
0�(Xt, Xt(Y
t))dt
�− cmleP,P (snr) ≥ (1− ε) ·minimax(P, snr) (34)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2−ε·C(P,snr), (35)
w∗ being the capacity achieving prior
10
∀ε > 0 and any X(·)cmseP,X(snr)− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (36)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (37)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
7 Implications
7.1 Mutual Information and Minimum Mean Estimation Loss
Let X be a non-negative random variable and, for γ > 0, let Yγ be a non-negative integer-valued random variable,jointly distributed with X such that the conditional law of Yγ given X is Poisson(γX). When specialized to thissetting, Theorem 2 of [14] gives
d
dγI(X;Yγ) = E [X logX − E[X|Yγ ] logE[X|Yγ ]] . (38)
It is instructive to observe that the right hand side of (37) is nothing but the minimum mean loss in estimating X
based on Yγ under the loss function �. Indeed, denoting this minimum mean loss by mmle(γ), i.e.,
mmle(γ)�= E [� (X,E[X|Yγ ])] , (39)
we have
E [� (X,E[X|Yγ ])] = E
�X log
X
E[X|Yγ ]−X + E[X|Yγ ]
�(40)
= E [X logX −X logE[X|Yγ ]] (41)
= E [X logX − E[X|Yγ ] logE[X|Yγ ]] . (42)
Thus, (37) can be stated as the “I-MMLE” relationship
d
dγI(X;Yγ) = mmle(γ), (43)
in complete analogy with the I-MMSE relationship of [13]. To see one immediate benefit of this realization that theright hand side of (37) coincides with the minimum mean loss in the right hand side of (42), we first go throughthe following data processing argument: Fix γ
�< γ, let {Bi}i≥1 be i.i.d. Bernoulli(γ�
/γ) independent of (X,Yγ),
and note that�X,
�Yγ
i=1 Bi
�is equal in distribution to (X,Yγ�). Since estimating X based on
�Yγ
i=1 Bi, which is a
function of Yγ and the randomization sequence {Bi}, cannot be better (in the sense of minimizing the expected lossunder �) than estimating X based on Yγ , we have mmle(γ�) ≥ mmle(γ). Thus, mmle(γ) is non-increasing with γ
which, when combined with (42), yields the following analogue of [13, Corollary 1]:
Corollary 7.1 I(X;Yγ) is concave in γ.
It is also worth pointing out that the I-MMLE relationship can be viewed as a direct consequence of Theorem6.2. Indeed, in the notation of Section 6.2, (42) is expressed as
d
dγIP (X;Yγ) = mleP,P (γ), (44)
12
∀ε > 0 and any X(·)cmseP,X(snr)− cmseP,P (snr) ≥ (1− ε) ·minimax(P, snr) (36)
for all P ∈ P with the possible exception of sources in a subset B ⊂ P where
w∗(B) ≤ e · 2
−ε·C��
PY Tsnr
�
P∈P
�
, (37)
w∗(B) ≤ e · 2−ε·minimax(P,snr)
,
w∗ being the capacity achieving prior
7 Implications
7.1 Mutual Information and Minimum Mean Estimation Loss
Let X be a non-negative random variable and, for γ > 0, let Yγ be a non-negative integer-valued random variable,jointly distributed with X such that the conditional law of Yγ given X is Poisson(γX). When specialized to thissetting, Theorem 2 of [14] gives
d
dγI(X;Yγ) = E [X logX − E[X|Yγ ] logE[X|Yγ ]] . (38)
It is instructive to observe that the right hand side of (37) is nothing but the minimum mean loss in estimating X
based on Yγ under the loss function �. Indeed, denoting this minimum mean loss by mmle(γ), i.e.,
mmle(γ)�= E [� (X,E[X|Yγ ])] , (39)
we have
E [� (X,E[X|Yγ ])] = E
�X log
X
E[X|Yγ ]−X + E[X|Yγ ]
�(40)
= E [X logX −X logE[X|Yγ ]] (41)
= E [X logX − E[X|Yγ ] logE[X|Yγ ]] . (42)
Thus, (37) can be stated as the “I-MMLE” relationship
d
dγI(X;Yγ) = mmle(γ), (43)
in complete analogy with the I-MMSE relationship of [13]. To see one immediate benefit of this realization that theright hand side of (37) coincides with the minimum mean loss in the right hand side of (42), we first go throughthe following data processing argument: Fix γ
�< γ, let {Bi}i≥1 be i.i.d. Bernoulli(γ�
/γ) independent of (X,Yγ),
and note that�X,
�Yγ
i=1 Bi
�is equal in distribution to (X,Yγ�). Since estimating X based on
�Yγ
i=1 Bi, which is a
function of Yγ and the randomization sequence {Bi}, cannot be better (in the sense of minimizing the expected lossunder �) than estimating X based on Yγ , we have mmle(γ�) ≥ mmle(γ). Thus, mmle(γ) is non-increasing with γ
which, when combined with (42), yields the following analogue of [13, Corollary 1]:
Corollary 7.1 I(X;Yγ) is concave in γ.
It is also worth pointing out that the I-MMLE relationship can be viewed as a direct consequence of Theorem6.2. Indeed, in the notation of Section 6.2, (42) is expressed as
d
dγIP (X;Yγ) = mleP,P (γ), (44)
12
Monday, June 3, 2013
“Minimax Filtering via Relations Between Information and Estimation”
ISIT 2013IEEE International Symposium on Information Theory
July 7-12, 2013 — Istanbul, Turkey
Albert No and T. Weissman
Monday, June 3, 2013
lookahead
Monday, June 3, 2013
question
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1P (X = x)
AWGN channeldYt = Xtdt + dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X
[Duncan 1970]:For example, consider Duncan’s relationship
I(XT ;Y T ) =12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
�=
12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 0
What else can we say about the random variable
logdPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�=?
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 2I(XT ;Y T ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√
γXtdt + dWt, 0 ≤ t ≤ T
For stationary X = {Xt} let
mmsed(X, d, γ) = V ar(X0|Y d−∞)
let I(γ) here be the mutual information rate
can I(·) determine lmmse(d, snr) ?
We’ve seen that I(·) determines both mmsed(X, 0, γ) and mmsed(X,∞, γ)Does I(·) determine mmsed(X, d, γ) in general?No: In general mmsed(X, d, γ) �= mmsed(X(r)
, d, γ), where X(r) is the time-reversed X
I(γ) = I(XT ;Y T )
2Monday, June 3, 2013
question
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1P (X = x)
AWGN channeldYt = Xtdt + dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X
[Duncan 1970]:For example, consider Duncan’s relationship
I(XT ;Y T ) =12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
�=
12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 0
What else can we say about the random variable
logdPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�=?
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 2I(XT ;Y T ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√
γXtdt + dWt, 0 ≤ t ≤ T
For stationary X = {Xt} let
mmsed(X, d, γ) = V ar(X0|Y d−∞)
let I(γ) here be the mutual information rate
can I(·) determine lmmse(d, snr) ?
We’ve seen that I(·) determines both mmsed(X, 0, γ) and mmsed(X,∞, γ)Does I(·) determine mmsed(X, d, γ) in general?No: In general mmsed(X, d, γ) �= mmsed(X(r)
, d, γ), where X(r) is the time-reversed X
I(γ) = I(XT ;Y T )
2
1 for Duncan slide
H(X) =�
x∈XP (X = x) log
1P (X = x)
AWGN channeldYt = Xtdt + dWt, 0 ≤ t ≤ T
W is standard white Gaussian noise, independent of X
[Duncan 1970]:For example, consider Duncan’s relationship
I(XT ;Y T ) =12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
�=
12E
�� T
0(Xt − E[Xt|Y t])2dt
�
⇔
E
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 0
What else can we say about the random variable
logdPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�=?
V ar
�log
dPXT ,Y T
dPXT × PY T
− 12
� T
0(Xt − E[Xt|Y t])2dt
�= 2I(XT ;Y T ) = E
�� T
0(Xt − E[Xt|Y t])2dt
�
dYt =√
γXtdt + dWt, 0 ≤ t ≤ T
For stationary X = {Xt} let
mmsed(X, d, γ) = V ar(X0|Y d−∞)
let I(γ) here be the mutual information rate
can I(·) determine lmmse(d, snr) ?
how about I(·) and Sx(·) ?
We’ve seen that I(·) determines both mmsed(X, 0, γ) and mmsed(X,∞, γ)Does I(·) determine mmsed(X, d, γ) in general?No: In general mmsed(X, d, γ) �= mmsed(X(r)
, d, γ), where X(r) is the time-reversed X
2Monday, June 3, 2013
a time irreversible process
Monday, June 3, 2013
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse??⇒
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
What is Cost of Mismatch?
D(P�Q) =
� ∞
0[mseP,Q(γ)−mseP,P (γ)]dγ
D(PYsnr�QYsnr) =
� snr
0[mseP,Q(γ)−mseP,P (γ)]dγ
d
dγD(PY �QY ) = mseP,Q(γ)−mseP,P (γ)
X ∼ P
?
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
3
Monday, June 3, 2013
Poisson Channel
5 Notation and Conventions
Our conventions and notation for information measures, such as mutual information and relative entropy, are stan-
dard. The initiated reader is advised to skip this section. If U, V,W are three random variables taking values in
Polish spaces U ,V,W, respectively, and defined on a common probability space with a probability measure P , we let
PU , PU,V etc. denote the probability measures induced on U , the pair (U ,V) etc. while e.g., PU |V denotes a regular
version of the conditional distribution of U given V . PU |v is the distribution on U obtained by evaluating that regular
version at v. If Q is another probability measure on the same measurable space we similarly denote QU , QU |V , etc.As usual, given two measures on the same measurable space, e.g., P and Q, define their relative entropy (divergence)
by
D(P�Q) =
� �log
dP
dQ
�dP (12)
when P is absolutely continuous w.r.t. Q, defining D(P�Q) = ∞ otherwise. An immediate consequence of the
definitions of relative entropy and of the Radon-Nikodym derivative is that if f : U → V is measurable and one-to-
one, and V = f(U), then
D(PU�QU ) = D(PV �QV ). (13)
Following [5], we further use the notation
D(PU |V �QU |V |PV ) =
�D(PU |v�QU |v)dPV (v), (14)
where on the right side D(PU |v�QU |v) is a divergence in the sense of (12) between the measures PU |v and QU |v. It
will be convenient to write
D(PU |V �QU |V ) (15)
to denote f(V ) when f(v) = D(PU |v�QU |v). Thus D(PU |V �QU |V ) is a random variable while D(PU |V �QU |V |PV ) is
its expectation under P . With this notation, the chain rule for relative entropy (cf., e.g., [6, Subsection D.3]) is
D(PU,V �QU,V ) = D(PU�QU ) +D(PV |U�QV |U |PU ) (16)
and is valid regardless of the finiteness of both sides of the equation.
The mutual information between U and V is defined as
I(U ;V ) = D(PU,V �PU × PV ), (17)
where PU × PV denotes the product measure induced by PU and PV . We note in passing, in line with the comment
on relative entropy and one-to-one transformations leading to (13), that if f and g are two measurable one-to-one
transformations and A = f(U) while B = g(V ), then
I(U ;V ) = I(A;B). (18)
Finally, the conditional mutual information between U and V , given W , is defined as
I(U ;V |W ) = D(PU,V |W �PU |W × PV |W |PW ). (19)
The roles of U, V,W will be played in what follows by scalar random variables, vectors, or processes.
6 Relative Entropy and Mismatched Estimation
6.1 For slides
• Scalar Channel:
X ≥ 0
Yγ |X ∼ Poisson(γ ·X)
7
• Continuous-time Channel:
XT a non-negative stochastic process
Y Tγ |XT non-homogenous Poisson of intensity γ ·XT
• Note
D (exp(λ1)�exp(λ2)) =1
λ1· �(λ1,λ2)
Compare with
D�N (µ1,σ
2)�N (µ2,σ2)�=
1
2σ2· (µ1 − µ2)
2
•
I�XT ;Y T
γ
�= E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
cmmle(γ) = E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
mmle(γ) = E
�� T
0��Xt, E[Xt|Y T
γ ]�dt
�
cmmle(snr) =1
snr
� snr
0mmle(γ)dγ
Relationship between cmmle and mmle
I(U ;V )X independent of Z ∼ N (0, 1)
d
dth�X +
√tZ
�=
1
2J(X +
√tZ)
6.2 Random Variables
Suppose that X is a non-negative random variable and the conditional law of a r.v. Yγ , given X, is Poisson(γX). IfX ∼ P , denote expectation w.r.t. the corresponding joint law of X and Yγ by EP , the distribution of Yγ by PYγ ,the conditional expectation by EP [X|Yγ ], etc. We denote the mutual information by IP (X;Yγ) or simply I(X;Yγ)when there is no ambiguity. Let further mleP,Q(γ) denote the mean loss under � in estimating X based on Yγ usingthe estimator that would have been optimal had X ∼ Q when in fact X ∼ P , i.e.,
mleP,Q(γ)�= EP
���X,EQ[X|Yγ ]
��. (20)
The following is a new representation of relative entropy, paralleling the Gaussian channel result of [31]:
Theorem 6.1 For any pair P,Q of probability measures over [a, b], where 0 < a < b < ∞,
D(P�Q) =
� ∞
0[mleP,Q(γ)−mleP,P (γ)] dγ (21)
Theorem 6.1 is a direct consequence of the fact (proved in Section 9) that
limγ→∞
D(PYγ�QYγ ) = D(P�Q), (22)
combined with the following result, which is the Poisson parallel of [31, Equation (24)]:
8
Monday, June 3, 2013
quest for
this relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus itis the amplification factor γ rather than γ2 that plays the role of SNR. We refer to [32] for a review of the literature onthe Poisson channel and its communication theoretic significance, and to [11] and references therein for applicationsof Poisson channel models in other fields.
The function �0(x) = x log x − x + 1, x > 0 (where log denotes the natural logarithm throughout), being theconvex conjugate of the Poisson distribution’s log moment generating function, arises naturally in analysis of Poissonand continuous time jump Markov processes in a variety of situations. These include relative entropy representationfor jump Markov processes (see, e.g., equation (3.20) and Theorem 3.3 of [8]), large deviation local rate function forsuch processes ([8], Chapter 5 of [29]), mutual information in the Poisson channel (Section 19.5 and equation (19.135)of [20]), and logarithmic transformations in stochastic control theory (Section 3 of [9]). It is also intimately relatedto change-of-measure formulae for point processes in the spirit of the Girsanov transformation (Section VI.(5.5–6) of[4], [16], [28]). It is therefore not surprising that the function �0 appears in this paper in representations for relativeentropy and related calculations. It is less obvious, however, that using it to define estimation loss turns out to bevery useful and, in particular, gives rise to a number of results that parallel the Gaussian theory.
Enter the loss function � : [0,∞)× [0,∞) → [0,∞] defined by x�0(x/x) or, more precisely,
�(x, x) = x log(x/x)− x+ x, (1)
where the right hand side of (1) is well-defined as an extended non-negative real number in view of our conventions0 log 0 = 0, 0 log 0/0 = 0, c/0 = ∞ and log c/0 = ∞ for c > 0. In Section 2, we exhibit properties of this loss functionthat show it is a natural one for measuring goodness of reconstruction of non-negative objects, and that it sharessome of its key properties with the squared error loss, such as optimality of the conditional expectation under themean loss criterion.
The goal of this paper is to show that a set of relations identical to those that hold for the Gaussian channel– ranging from Duncan’s formula [7], to the I-MMSE of [13, 34], to Verdu’s relationship between relative entropyand mismatched estimation [31], to the relationship between causal and non-causal estimation in continuous timefor matched [13] and mismatched [33] filters – hold for the Poisson channel upon replacing the squared error loss bythe loss function in (1).
It is instructive to note that while the relative entropy between two Gaussians of the same variance and meansm1 and m2 is equal to (m1 − m2)2, that between two exponentials of parameters λ1 and λ2 is equal to �(λ1,λ2)(with additional multiplicative terms in both cases). Although this simple fact does not exclusively explain theGaussian-Poissonian analogy, it lies at its heart, along with further properties of � observed in Section 2.
2
Monday, June 3, 2013
this relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus itis the amplification factor γ rather than γ2 that plays the role of SNR. We refer to [32] for a review of the literature onthe Poisson channel and its communication theoretic significance, and to [11] and references therein for applicationsof Poisson channel models in other fields.
The function �0(x) = x log x − x + 1, x > 0 (where log denotes the natural logarithm throughout), being theconvex conjugate of the Poisson distribution’s log moment generating function, arises naturally in analysis of Poissonand continuous time jump Markov processes in a variety of situations. These include relative entropy representationfor jump Markov processes (see, e.g., equation (3.20) and Theorem 3.3 of [8]), large deviation local rate function forsuch processes ([8], Chapter 5 of [29]), mutual information in the Poisson channel (Section 19.5 and equation (19.135)of [20]), and logarithmic transformations in stochastic control theory (Section 3 of [9]). It is also intimately relatedto change-of-measure formulae for point processes in the spirit of the Girsanov transformation (Section VI.(5.5–6) of[4], [16], [28]). It is therefore not surprising that the function �0 appears in this paper in representations for relativeentropy and related calculations. It is less obvious, however, that using it to define estimation loss turns out to bevery useful and, in particular, gives rise to a number of results that parallel the Gaussian theory.
Enter the loss function � : [0,∞)× [0,∞) → [0,∞] defined by x�0(x/x) or, more precisely,
�(x, x) = x log(x/x)− x+ x, (1)
where the right hand side of (1) is well-defined as an extended non-negative real number in view of our conventions0 log 0 = 0, 0 log 0/0 = 0, c/0 = ∞ and log c/0 = ∞ for c > 0. In Section 2, we exhibit properties of this loss functionthat show it is a natural one for measuring goodness of reconstruction of non-negative objects, and that it sharessome of its key properties with the squared error loss, such as optimality of the conditional expectation under themean loss criterion.
The goal of this paper is to show that a set of relations identical to those that hold for the Gaussian channel– ranging from Duncan’s formula [7], to the I-MMSE of [13, 34], to Verdu’s relationship between relative entropyand mismatched estimation [31], to the relationship between causal and non-causal estimation in continuous timefor matched [13] and mismatched [33] filters – hold for the Poisson channel upon replacing the squared error loss bythe loss function in (1).
It is instructive to note that while the relative entropy between two Gaussians of the same variance and meansm1 and m2 is equal to (m1 − m2)2, that between two exponentials of parameters λ1 and λ2 is equal to �(λ1,λ2)(with additional multiplicative terms in both cases). Although this simple fact does not exclusively explain theGaussian-Poissonian analogy, it lies at its heart, along with further properties of � observed in Section 2.
2
[26] D. P. Palomar and S. Verdu, “Representation of Mutual Information via Input Estimates,” IEEE Trans. Infor-
mation Theory, vol. 53, no. 2, pp. 453-470, Feb. 2007.
[27] B. Y. Ryabko, “Encoding a source with unknown but ordered probabilities,” Probl. Inf. Transm., pp. 134-139,Oct. 1979.
[28] A. Segall and T. Kailath, “Radon-Nikodym derivatives with respect to measures induced by discontinuousindependent-increment processes,” Ann. Probab., Vol. 3 No. 3, pp. 449–464, 1975
[29] A. Shwartz and A. Weiss. Large Deviations for Performance Analysis. Queues, Communications, and Comput-
ing. Chapman & Hall, London, 1995
[30] A. M. Tulino and S. Verdu, “Monotonic Decrease of the Non-Gaussianness of the Sum of Independent RandomVariables: A Simple Proof,” IEEE Trans. Information Theory, vol. 52, no. 9, pp. 4295-4297, Sep. 2006.
[31] S. Verdu, “Mismatched estimation and relative entropy,” IEEE Trans. Information Theory, vol. 56, no. 8, pp.3712-3720, August 2010.
[32] S. Verdu, “Poisson communication theory,” International Technion Communication Day in Honor of Israel
Bar-David, March 1999.
[33] T. Weissman, “The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-TimeAWGN Channels,” IEEE Trans. Information Theory, vol. 56, no. 9, pp. 4256 - 4273, September 2010.
[34] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,”IEEE Trans. Information Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(a) �(1, x)
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(b) �(x, 1) = x log x− x+ 1
Figure 1: The loss function �
A
C
B
D
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.1
0.2
0.3
0.4
0.5
Figure 2: The curves mleP,P (γ), cmleP,P (γ), mleP,Q(γ) and cmleP,Q(γ), marked respectively by A,B,C,D, of theexample in Section 8.1, plotted here for p = 1/2 and q = 1/5.
27
[26] D. P. Palomar and S. Verdu, “Representation of Mutual Information via Input Estimates,” IEEE Trans. Infor-
mation Theory, vol. 53, no. 2, pp. 453-470, Feb. 2007.
[27] B. Y. Ryabko, “Encoding a source with unknown but ordered probabilities,” Probl. Inf. Transm., pp. 134-139,Oct. 1979.
[28] A. Segall and T. Kailath, “Radon-Nikodym derivatives with respect to measures induced by discontinuousindependent-increment processes,” Ann. Probab., Vol. 3 No. 3, pp. 449–464, 1975
[29] A. Shwartz and A. Weiss. Large Deviations for Performance Analysis. Queues, Communications, and Comput-
ing. Chapman & Hall, London, 1995
[30] A. M. Tulino and S. Verdu, “Monotonic Decrease of the Non-Gaussianness of the Sum of Independent RandomVariables: A Simple Proof,” IEEE Trans. Information Theory, vol. 52, no. 9, pp. 4295-4297, Sep. 2006.
[31] S. Verdu, “Mismatched estimation and relative entropy,” IEEE Trans. Information Theory, vol. 56, no. 8, pp.3712-3720, August 2010.
[32] S. Verdu, “Poisson communication theory,” International Technion Communication Day in Honor of Israel
Bar-David, March 1999.
[33] T. Weissman, “The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-TimeAWGN Channels,” IEEE Trans. Information Theory, vol. 56, no. 9, pp. 4256 - 4273, September 2010.
[34] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,”IEEE Trans. Information Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(a) �(1, x)
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(b) �(x, 1) = x log x− x+ 1
Figure 1: The loss function �
A
C
B
D
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.1
0.2
0.3
0.4
0.5
Figure 2: The curves mleP,P (γ), cmleP,P (γ), mleP,Q(γ) and cmleP,Q(γ), marked respectively by A,B,C,D, of theexample in Section 8.1, plotted here for p = 1/2 and q = 1/5.
27
[26] D. P. Palomar and S. Verdu, “Representation of Mutual Information via Input Estimates,” IEEE Trans. Infor-
mation Theory, vol. 53, no. 2, pp. 453-470, Feb. 2007.
[27] B. Y. Ryabko, “Encoding a source with unknown but ordered probabilities,” Probl. Inf. Transm., pp. 134-139,Oct. 1979.
[28] A. Segall and T. Kailath, “Radon-Nikodym derivatives with respect to measures induced by discontinuousindependent-increment processes,” Ann. Probab., Vol. 3 No. 3, pp. 449–464, 1975
[29] A. Shwartz and A. Weiss. Large Deviations for Performance Analysis. Queues, Communications, and Comput-
ing. Chapman & Hall, London, 1995
[30] A. M. Tulino and S. Verdu, “Monotonic Decrease of the Non-Gaussianness of the Sum of Independent RandomVariables: A Simple Proof,” IEEE Trans. Information Theory, vol. 52, no. 9, pp. 4295-4297, Sep. 2006.
[31] S. Verdu, “Mismatched estimation and relative entropy,” IEEE Trans. Information Theory, vol. 56, no. 8, pp.3712-3720, August 2010.
[32] S. Verdu, “Poisson communication theory,” International Technion Communication Day in Honor of Israel
Bar-David, March 1999.
[33] T. Weissman, “The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-TimeAWGN Channels,” IEEE Trans. Information Theory, vol. 56, no. 9, pp. 4256 - 4273, September 2010.
[34] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,”IEEE Trans. Information Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(a) �(1, x)
1 2 3 4
0.5
1.0
1.5
2.0
2.5
(b) �(x, 1) = x log x− x+ 1
Figure 1: The loss function �
A
C
B
D
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.1
0.2
0.3
0.4
0.5
Figure 2: The curves mleP,P (γ), cmleP,P (γ), mleP,Q(γ) and cmleP,Q(γ), marked respectively by A,B,C,D, of theexample in Section 8.1, plotted here for p = 1/2 and q = 1/5.
27
Monday, June 3, 2013
An observation (and hint)
• Continuous-time Channel:
XT a non-negative stochastic process
Y Tγ |XT non-homogenous Poisson of intensity γ ·XT
• Note
D (exp(λ1)�exp(λ2)) =1
λ1· �(λ1,λ2)
D (Poisson(λ1)�Poisson(λ2)) = �(λ1,λ2)
Compare with
D�N (µ1,σ
2)�N (µ2,σ2)�=
1
2σ2· (µ1 − µ2)
2
•
I�XT ;Y T
γ
�= γ · E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
cmleP,Q(snr) =1
snr
�I�XT ;Y T
snr
�+D
�PY T
snr�QY T
snr
��
cmmle(γ) = E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
mmle(γ) = E
�� T
0��Xt, E[Xt|Y T
γ ]�dt
�
cmmle(snr) =1
snr
� snr
0mmle(γ)dγ
Relationship between cmmle and mmle
I(U ;V )X independent of Z ∼ N (0, 1)
d
dth�X +
√tZ
�=
1
2J(X +
√tZ)
6.2 Random Variables
Suppose that X is a non-negative random variable and the conditional law of a r.v. Yγ , given X, is Poisson(γX). IfX ∼ P , denote expectation w.r.t. the corresponding joint law of X and Yγ by EP , the distribution of Yγ by PYγ ,the conditional expectation by EP [X|Yγ ], etc. We denote the mutual information by IP (X;Yγ) or simply I(X;Yγ)when there is no ambiguity. Let further mleP,Q(γ) denote the mean loss under � in estimating X based on Yγ usingthe estimator that would have been optimal had X ∼ Q when in fact X ∼ P , i.e.,
mleP,Q(γ)�= EP
���X,EQ[X|Yγ ]
��. (20)
The following is a new representation of relative entropy, paralleling the Gaussian channel result of [?]:
Theorem 6.1 For any pair P,Q of probability measures over [a, b], where 0 < a < b < ∞,
D(P�Q) =
� ∞
0[mleP,Q(γ)−mleP,P (γ)] dγ (21)
8
Monday, June 3, 2013
An observation (and hint)
• Continuous-time Channel:
XT a non-negative stochastic process
Y Tγ |XT non-homogenous Poisson of intensity γ ·XT
• Note
D (exp(λ1)�exp(λ2)) =1
λ1· �(λ1,λ2)
D (Poisson(λ1)�Poisson(λ2)) = �(λ1,λ2)
Compare with
D�N (µ1,σ
2)�N (µ2,σ2)�=
1
2σ2· (µ1 − µ2)
2
•
I�XT ;Y T
γ
�= γ · E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
cmleP,Q(snr) =1
snr
�I�XT ;Y T
snr
�+D
�PY T
snr�QY T
snr
��
cmmle(γ) = E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
mmle(γ) = E
�� T
0��Xt, E[Xt|Y T
γ ]�dt
�
cmmle(snr) =1
snr
� snr
0mmle(γ)dγ
Relationship between cmmle and mmle
I(U ;V )X independent of Z ∼ N (0, 1)
d
dth�X +
√tZ
�=
1
2J(X +
√tZ)
6.2 Random Variables
Suppose that X is a non-negative random variable and the conditional law of a r.v. Yγ , given X, is Poisson(γX). IfX ∼ P , denote expectation w.r.t. the corresponding joint law of X and Yγ by EP , the distribution of Yγ by PYγ ,the conditional expectation by EP [X|Yγ ], etc. We denote the mutual information by IP (X;Yγ) or simply I(X;Yγ)when there is no ambiguity. Let further mleP,Q(γ) denote the mean loss under � in estimating X based on Yγ usingthe estimator that would have been optimal had X ∼ Q when in fact X ∼ P , i.e.,
mleP,Q(γ)�= EP
���X,EQ[X|Yγ ]
��. (20)
The following is a new representation of relative entropy, paralleling the Gaussian channel result of [?]:
Theorem 6.1 For any pair P,Q of probability measures over [a, b], where 0 < a < b < ∞,
D(P�Q) =
� ∞
0[mleP,Q(γ)−mleP,P (γ)] dγ (21)
8
• Continuous-time Channel:
XT a non-negative stochastic process
Y Tγ |XT non-homogenous Poisson of intensity γ ·XT
• Note
D (exp(λ1)�exp(λ2)) =1
λ1· �(λ1,λ2)
D (Poisson(λ1)�Poisson(λ2)) = �(λ1,λ2)
Compare with
D�N (µ1,σ
2)�N (µ2,σ2)�=
1
2σ2· (µ1 − µ2)
2
D (N (µ1, 1)�N (µ2, 1)) =1
2· (µ1 − µ2)
2
•
I�XT ;Y T
γ
�= γ · E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
cmleP,Q(snr) =1
snr
�I�XT ;Y T
snr
�+D
�PY T
snr�QY T
snr
��
cmmle(γ) = E
�� T
0��Xt, E[Xt|Y t
γ ]�dt
�
mmle(γ) = E
�� T
0��Xt, E[Xt|Y T
γ ]�dt
�
cmmle(snr) =1
snr
� snr
0mmle(γ)dγ
Relationship between cmmle and mmle
I(U ;V )X independent of Z ∼ N (0, 1)
d
dth�X +
√tZ
�=
1
2J(X +
√tZ)
6.2 Random Variables
Suppose that X is a non-negative random variable and the conditional law of a r.v. Yγ , given X, is Poisson(γX). IfX ∼ P , denote expectation w.r.t. the corresponding joint law of X and Yγ by EP , the distribution of Yγ by PYγ ,the conditional expectation by EP [X|Yγ ], etc. We denote the mutual information by IP (X;Yγ) or simply I(X;Yγ)when there is no ambiguity. Let further mleP,Q(γ) denote the mean loss under � in estimating X based on Yγ usingthe estimator that would have been optimal had X ∼ Q when in fact X ∼ P , i.e.,
mleP,Q(γ)�= EP
���X,EQ[X|Yγ ]
��. (20)
The following is a new representation of relative entropy, paralleling the Gaussian channel result of [31]:
8
Monday, June 3, 2013
d
dγI(γ) =
1
2mmse(γ)
mmse(γ) = E
�� T
0(Xt − E[Xt|Y T ])2dt
�
or in its integral version
I(snr) =1
2
� snr
0mmse(γ)dγ
cmmse(snr) =1
snr
� snr
0mmse(γ)dγ
Relationship between cmmse and mmse??⇒
�⇒
+
=?
What if X ∼ P but the estimator thinks X ∼ Q ?
mseP,Q(γ) = EP
�(X − EQ[X|Y ])2
�
What is Cost of Mismatch?
D(P�Q) =
� ∞
0[mseP,Q(γ)−mseP,P (γ)]dγ
D(PYsnr�QYsnr) =
� snr
0[mseP,Q(γ)−mseP,P (γ)]dγ
d
dγD(PY �QY ) = mseP,Q(γ)−mseP,P (γ)
X ∼ P
?
3 Introduction
In the seminal paper [13], Guo, Shamai and Verdu discovered that the derivative of the mutual information betweenthe input and the output in a real-valued scalar Gaussian channel, with respect to the signal-to-noise ratio (SNR),is equal to the minimum mean square error (MMSE) in estimating the input based on the output. This simplerelationship holds regardless of the input distribution, and carries over essentially verbatim to vectors, as well as thecontinuous-time Additive White Gaussian Noise (AWGN) channel (cf. [34, 21] for even more general settings wherethis relationship holds). When combined with Duncan’s theorem [7], it was also shown to imply a remarkable rela-tionship between the MMSEs in causal (filtering) and non-causal (smoothing) estimation of an arbitrarily distributedcontinuous-time signal corrupted by Gaussian noise: the filtering MMSE at SNR level γ is equal to the mean valueof the smoothing MMSE with SNR uniformly distributed between 0 and γ. The relation of the mutual informationto both types of MMSE thus served as a bridge between the two quantities.
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due to
3
Punch Line
[Rami Atar and T.W. 2012]:
under the above
More recently, Verdu has shown in [31] that when X ∼ P is estimated based on Y by a mismatched estimatorthat would have minimized the MSE had X ∼ Q, the integral over all SNR values up to γ of the excess MSE due tothe mismatch is equal to the relative entropy between the true channel output distribution and the channel outputdistribution under Q, at SNR = γ.
This result was key in [33], where it was shown that the relationship between the causal and non-causal MMSEscontinues to hold also in the mismatched case, i.e. when the filters are optimized for an underlying signal distributionthat differs from the true one. The bridge between the two sides of the equality in this mismatched case was shown tobe the sum of the mutual information and the relative entropy between the true and mismatched output distributions,this relative entropy thus quantifying the penalty due to mismatch.
Consider now the Poisson channel, by which we mean, for the case of scalar random variables, that X, theinput, is a non-negative random variable while the conditional distribution of the output Y given the input isgiven by Poisson(γ · X), the parameter γ ≥ 0 here playing the role of SNR. In the continuous time setting, thechannel input is XT = {Xt, 0 ≤ t ≤ T}, a non-negative stochastic process, and conditionally on XT , the outputY T = {Yt, 0 ≤ t ≤ T} is a non-homogeneous Poisson process with intensity function γ ·XT . Often referred to as the“ideal Poisson channel” [19], this model is the canonical one for describing direct detection optical communication:The channel input represents the squared magnitude of the electric field incident on the photo-detector, while itsoutput is the counting process describing the arrival times of the photons registered by the detector. Here the energyof the channel input signal is proportional to its l1 norm, rather than the l2 norm as in the Gaussian channel. Thus itis the amplification factor γ rather than γ2 that plays the role of SNR. We refer to [32] for a review of the literature onthe Poisson channel and its communication theoretic significance, and to [11] and references therein for applicationsof Poisson channel models in other fields.
The function �0(x) = x log x − x + 1, x > 0 (where log denotes the natural logarithm throughout), being theconvex conjugate of the Poisson distribution’s log moment generating function, arises naturally in analysis of Poissonand continuous time jump Markov processes in a variety of situations. These include relative entropy representationfor jump Markov processes (see, e.g., equation (3.20) and Theorem 3.3 of [8]), large deviation local rate function forsuch processes ([8], Chapter 5 of [29]), mutual information in the Poisson channel (Section 19.5 and equation (19.135)of [20]), and logarithmic transformations in stochastic control theory (Section 3 of [9]). It is also intimately relatedto change-of-measure formulae for point processes in the spirit of the Girsanov transformation (Section VI.(5.5–6) of[4], [16], [28]). It is therefore not surprising that the function �0 appears in this paper in representations for relativeentropy and related calculations. It is less obvious, however, that using it to define estimation loss turns out to bevery useful and, in particular, gives rise to a number of results that parallel the Gaussian theory.
Enter the loss function � : [0,∞)× [0,∞) → [0,∞] defined by x�0(x/x) or, more precisely,
�(x, x) = x log(x/x)− x+ x, (1)
where the right hand side of (1) is well-defined as an extended non-negative real number in view of our conventions0 log 0 = 0, 0 log 0/0 = 0, c/0 = ∞ and log c/0 = ∞ for c > 0. In Section 4, we exhibit properties of this loss functionthat show it is a natural one for measuring goodness of reconstruction of non-negative objects, and that it sharessome of its key properties with the squared error loss, such as optimality of the conditional expectation under themean loss criterion.
The goal of this paper is to show that a set of relations identical to those that hold for the Gaussian channel– ranging from Duncan’s formula [7], to the I-MMSE of [13, 34], to Verdu’s relationship between relative entropyand mismatched estimation [31], to the relationship between causal and non-causal estimation in continuous timefor matched [13] and mismatched [33] filters – hold for the Poisson channel upon replacing the squared error loss bythe loss function in (1).
It is instructive to note that while the relative entropy between two Gaussians of the same variance and meansm1 and m2 is equal to (m1 − m2)2, that between two exponentials of parameters λ1 and λ2 is equal to �(λ1,λ2)(with additional multiplicative terms in both cases). Although this simple fact does not exclusively explain theGaussian-Poissonian analogy, it lies at its heart, along with further properties of � observed in Section 4.
Our emphasis is on the results for the mismatched setting, relating the cost of mismatch to relative entropy inthe Poisson channel. The results for the exact (i.e., non-mismatched) setting, relating the minimum mean loss tomutual information, and causal to non-causal minimum mean estimation loss, are shown to follow as special cases.The latter results, for the exact setting, are consistent and in fact coincide with those of [14] – which considered amore general Poisson channel model that accommodates the presence of dark current – when specialized to the caseof zero dark current. Our framework complements the results of [14] not only in extending the scope to the presence
4
Monday, June 3, 2013
and i mean everything
• i-mmse
• Duncan
• causal - non-causal
• mismatch
• minimax
Monday, June 3, 2013
the universal picture
Monday, June 3, 2013
universal denoising
Monday, June 3, 2013
universal probability assignments:
X1, X2, X3, . . . ,Xi−1, Xi, . . .
Y1, Y2, Y3, . . . , Yi−1, Yi, . . .
I(Xi;Yi|Y i−1)
I(Y i−1;Xi|Xi−1)
C = limn→∞
max1n
I(Xn → Y n)
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “X causes Y”
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “Y causes X”
I(Xn → Y n) ≈ I(Y n−1 → Xn)� 0 ⇒ “X and Y are causing each other”
I(Xn;Y n) ≈ 0 ⇒ X and Y are essentially independent
I(X→ Y) = limn→∞
1n
I(Xn → Y n)
Q is universal if
limn→∞
1n
D(PXn�QXn) = 0
for every stationary P
and pointwise universal if
lim supn→∞
1n
logPXn(Xn)QXn(Xn)
≤ 0 P − a.s.
for every stationary and ergodic P
1
X1, X2, X3, . . . ,Xi−1, Xi, . . .
Y1, Y2, Y3, . . . , Yi−1, Yi, . . .
I(Xi;Yi|Y i−1)
I(Y i−1;Xi|Xi−1)
C = limn→∞
max1n
I(Xn → Y n)
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “X causes Y”
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “Y causes X”
I(Xn → Y n) ≈ I(Y n−1 → Xn)� 0 ⇒ “X and Y are causing each other”
I(Xn;Y n) ≈ 0 ⇒ X and Y are essentially independent
I(X→ Y) = limn→∞
1n
I(Xn → Y n)
Q is universal if
limn→∞
1n
D(PXn�QXn) = 0
for every stationary P
and pointwise universal if
lim supn→∞
1n
logPXn(Xn)QXn(Xn)
≤ 0 P − a.s.
for every stationary and ergodic P
1
X1, X2, X3, . . . ,Xi−1, Xi, . . .
Y1, Y2, Y3, . . . , Yi−1, Yi, . . .
I(Xi;Yi|Y i−1)
I(Y i−1;Xi|Xi−1)
C = limn→∞
max1n
I(Xn → Y n)
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “X causes Y”
I(Xn → Y n)� I(Y n−1 → Xn) ⇒ “Y causes X”
I(Xn → Y n) ≈ I(Y n−1 → Xn)� 0 ⇒ “X and Y are causing each other”
I(Xn;Y n) ≈ 0 ⇒ X and Y are essentially independent
I(X→ Y) = limn→∞
1n
I(Xn → Y n)
Q is universal if
limn→∞
1n
D(PXn�QXn) = 0
for every stationary P
and pointwise universal if
lim supn→∞
1n
logPXn(Xn)QXn(Xn)
≤ 0 P − a.s.
for every stationary and ergodic P
1
Monday, June 3, 2013
universal compressors (e.g.: Lempel-Ziv 78, CTW)
X1, X2, X3, . . . , Xi−1, Xi, . . .
Y1, Y2, Y3, . . . , Yi−1, Yi, . . .
H(X) =�
x
PX(x) log1
PX(x)
• I(X;Y ) = I(Y ;X)
• I(f(X); g(Y )) = I(X;Y ) if f and g are one-to-one
• chain rules
I(X;Y )
I(Xi;Yi|Y i−1)
I(Y i−1;Xi|Xi−1)
C = limn→∞
max1
nI(Xn → Y
n)
I(Xn → Yn) � I(Y n−1 → X
n) ⇒ “X causes Y”
I(Xn → Yn) � I(Y n−1 → X
n) ⇒ “Y causes X”
I(Xn → Yn) ≈ I(Y n−1 → X
n) � 0 ⇒ “X and Y are causing each other”
⇐⇒
I(Xn;Y n) ≈ 0 ⇒ X and Y are essentially independent
I(X → Y) = limn→∞
1
nI(Xn → Y
n)
Q is universal if
limn→∞
1
nD(PXn�QXn) = 0
for every stationary P
and pointwise universal if
lim supn→∞
1
nlog
PXn(Xn)
QXn(Xn)≤ 0 P − a.s.
for every stationary and ergodic P
1
X1, X2, X3, . . . , Xi−1, Xi, . . .
Y1, Y2, Y3, . . . , Yi−1, Yi, . . .
H(X) =�
x
PX(x) log1
PX(x)
• I(X;Y ) = I(Y ;X)
• I(f(X); g(Y )) = I(X;Y ) if f and g are one-to-one
• chain rules
I(X;Y )
I(Xi;Yi|Y i−1)
I(Y i−1;Xi|Xi−1)
C = limn→∞
max1
nI(Xn → Y
n)
I(Xn → Yn) � I(Y n−1 → X
n) ⇒ “X causes Y”
I(Xn → Yn) � I(Y n−1 → X
n) ⇒ “Y causes X”
I(Xn → Yn) ≈ I(Y n−1 → X
n) � 0 ⇒ “X and Y are causing each other”
⇐⇒
I(Xn;Y n) ≈ 0 ⇒ X and Y are essentially independent
I(X → Y) = limn→∞
1
nI(Xn → Y
n)
Q is universal if
limn→∞
1
nD(PXn�QXn) = 0
for every stationary P
and pointwise universal if
lim supn→∞
1
nlog
PXn(Xn)
QXn(Xn)≤ 0 P − a.s.
for every stationary and ergodic P
1
universal probability assignment
univ. sequential prob. assignment
(much more in ee376c)
univ. prediction, filtering, denoising, lossy compression
Monday, June 3, 2013