elements(of( informaon(theory( - altervistarelaonship(with(the(variance(•...
TRANSCRIPT
![Page 1: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/1.jpg)
Elements of Informa.on Theory
![Page 2: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/2.jpg)
Materials from the book of T. M. Cover and J. M. Thomas, “Element of informa.on theory”, Wiley.
![Page 3: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/3.jpg)
Measure of Informa.on (of an event)
• Given a probability mass func.on (pmf) p(x) of a random variable X.
• The informa,on, associated to an event with probability p(x), is defined as
• Less frequent event !!! A LOT OF informa.on.
• More frequent event !!! SMALL informa.on. • Base of the Log is 2 (we do not lose generality). €
I(x) = −log p(x)[ ] Units: bit
![Page 4: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/4.jpg)
![Page 5: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/5.jpg)
![Page 6: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/6.jpg)
Discrete Entropy
• Expected value of the informa.on
• IT IS A SCALAR VALUE. • It can be considered as a DISPERSION MEASURE of the pmf p(x). • The nota.on H(X) means that is related to the r.v. X.
• H(X) represents the UNCERTAINTY over the values that the random variable X can take.
€
H(X) = HX = − p(x = i)i=1
N
∑ log p(x = i)[ ]
![Page 7: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/7.jpg)
Discrete Entropy
![Page 8: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/8.jpg)
![Page 9: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/9.jpg)
![Page 10: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/10.jpg)
![Page 11: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/11.jpg)
The entropy does not depend on the values that the r.v. X can take. (in the example above they can be considered generic math-‐variables or simply “le[ers”…. )
![Page 12: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/12.jpg)
IMPORTANT:
H(X)=0 if the probability is of type 0,0,0,1,0,….0
H(X)=log N (i.e, its maximum value) if the probability is of type 1/N,1/N, 1/N, 1/N, 1/N,…. 1/N
![Page 13: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/13.jpg)
Entropy: measure of dispersion
• H(X) is a measure of DISPERSION (UNCERTAINTY):
• we do not consider the con.nuous scenario: Differen'al entropy (con'nuous case) is max when p(x) is a Gaussian density.
….
MAX DISCRETE ENTROPY: UNIFORM PMF
€
1N
€
1N
€
1N
€
1N
€
1
€
HX = log2 N
€
HX = 0€
0log2 0 = 0MIN DISCRETE ENTROPY: DELTA
![Page 14: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/14.jpg)
Rela.onship with the variance • Another dispersion measure is the variance. BUT the variance depends on the
support of the r.v. X (i.e., the values than X can take).
• For instance, we can permute the posi.ons of the deltas and the entropy does not change.
In this two pmfs: the entropy is the same!!! But the variance no!
![Page 15: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/15.jpg)
![Page 16: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/16.jpg)
-‐ What is the more “informa.ve” system? The first one, 2 events with probability of 0.9 and 0.1 The second one, 2 events with probability of 0.5 and 0.5
We have more “ques.ons” in the second case….
![Page 17: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/17.jpg)
[ ]
[ ]
![Page 18: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/18.jpg)
Joint Entropy of two r.v.’s X, Y
[ ]
![Page 19: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/19.jpg)
Condi.onal Entropy -‐ Y|X
[ ]
This guy does not need presenta.on… it is a standard entropy!
![Page 20: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/20.jpg)
Rela,onship among entropy, joint entropy and condi,onal entropy
![Page 21: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/21.jpg)
![Page 22: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/22.jpg)
This is the joint pmf…
Are you able to find marginal and condi.onal pmfs?
“Joint”
![Page 23: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/23.jpg)
[ ]
Rela,ve entropy – KL divergence
![Page 24: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/24.jpg)
Rela,ve entropy – KL divergence
Note that again it is not symmetric, and it is quite useful for the the causality (where important concept, especially in biomedical applica.ons).
![Page 25: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/25.jpg)
![Page 26: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/26.jpg)
MUTUAL INFORMATION
[ ]
It is symmetric.
![Page 27: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/27.jpg)
MUTUAL INFORMATION
IMPORTANT: WE CAN STUDY DEPENDENCY/INDEPENDENCY BETWEEN RANDOM VARIABLES (different from the correla.on coefficient…).
![Page 28: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/28.jpg)
Rela,onship between ENTROPY and MUTUAL INFORMATION
![Page 29: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/29.jpg)
Ex-‐Joint
![Page 30: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/30.jpg)
“informa,on can’t hurt”
![Page 31: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/31.jpg)
![Page 32: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/32.jpg)
SUMMARY
• Recall the defini.ons:
€
H(X,Y ) = HXY = − p(x = i,y = j)i=1
N
∑ log p(x = i,y = j)[ ]j=1
L
∑
€
I(X;Y ) = IXY = − p(x = i,y = j)i=1
N
∑ log p(x = i)p(y = j)p(x = i,y = j)
⎡
⎣ ⎢
⎤
⎦ ⎥
j=1
L
∑
€
p(x,y) = p(y | x)p(x)p(x,y) = p(x | y)p(y)
Recall that:
€
H(X |Y ) = HX |Y = − p(x = i,y = j)i=1
N
∑ log p(x = i | y = j)[ ]j=1
L
∑
H(Y | X) = HY |X = − p(x = i,y = j)i=1
N
∑ log p(y = j | x = i)[ ]j=1
L
∑
![Page 33: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/33.jpg)
SUMMARY -‐ RELATIONSHIPS
![Page 34: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/34.jpg)
SUMMARY -‐ RELATIONSHIPS
![Page 35: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/35.jpg)
RELATIONSHIPS
€
HX
€
HY
€
IXY
€
HX |Y
€
HY |X
€
HXY
Red: Hx Yellow: Hy Red+Yellow=Hxy (joint)
€
HXY
€
HX
€
HY
€
IXY
€
HX |Y
€
HY |X
![Page 36: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/36.jpg)
€
HXY
€
HX
€
HY
€
IXY
€
HX |Y
€
HY |X
We can obtain the inequali.es:
€
HXY ≤ HX +HY
HXY = HX +HY − IXYHXY = HX |Y +HY |X + IXYHXY = HX +HY |X
HXY = HY +HX |Y€
HX = HX |Y + IXYHY = HY |X + IXY
€
IXY = HX −HX |Y
IXY = HY −HY |X
IXY = HX +HY −HXY
IXY = IYX
€
HX ≤ HXY ≤ HX +HY
HY ≤ HXY ≤ HX +HY
RELATIONSHIPS
![Page 37: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/37.jpg)
Independent Variables
€
HXY
€
HX
€
HY
€
IXY = 0
€
HX |Y
€
HY |X
€
HX
€
HY
€
HX |Y
€
HY |X€
HXY
€
HX = HX |Y
HY = HY |X
HXY= HX +HY
The joint entropy is max, and I(X,Y) is min
![Page 38: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/38.jpg)
Case X=Y (totally dependent)
€
HXY
€
HX
€
HY
€
IXY = HX = HY = HXY
€
IXY
€
HXY = HX = HY = IXY
€
HX |Y = 0HY |X = 0€
IXY = HX = HY
![Page 39: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/39.jpg)
Important formulas
• Recall:
€
0 ≤ HX ≤ log2 M
€
0 ≤ HY ≤ log2 L
€
(HY =)HX ≤ HXY ≤ HX +HY
p(x) delta p(x) uniform
X=Y Independent variables
€
0 ≤ IXY ≤ HX (= HY ) X=Y
€
0 ≤ HX |Y ≤ HX
€
0 ≤ HY |X ≤ HY
X=Y
X=Y
Independent variables
Independent variables
Independent variables
![Page 40: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/40.jpg)
More processing on the data, more loss of informa,on….
Data-‐processing inequali,es
![Page 41: Elements(of( Informaon(Theory( - AltervistaRelaonship(with(the(variance(• Another(dispersion(measure(is(the(variance.(BUT(the(variance(depends(on(the(supportof(the(r.v.(X((i.e.,(the(values(than(Xcan(take)](https://reader036.vdocuments.mx/reader036/viewer/2022071023/5fd814a39095cc6b3141b0e7/html5/thumbnails/41.jpg)
Some Material is from the book of T. M. Cover and J. M. Thomas, “Element of informa.on theory”, Wiley.