yunshu_informationgeometry
TRANSCRIPT
-
8/11/2019 Yunshu_InformationGeometry
1/79
Introduction to Information Geometry
based on the book Methods of Information Geometry written by
Shun-Ichi Amari and Hiroshi Nagaoka
Yunshu Liu
2012-02-17
http://find/ -
8/11/2019 Yunshu_InformationGeometry
2/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Outline
1 Introduction to differential geometry
Manifold and Submanifold
Tangent vector, Tangent space and Vector field
Riemannian metric and Affine connection
Flatness and autoparallel
2 Geometric structure of statistical models and statistical inference
The Fisher metric and-connection
Exponential familyDivergence and Geometric statistical inference
Yunshu Liu (ASPITRG) Introduction to Information Geometry 2 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
3/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Part I
Introduction to differential geometry
Yunshu Liu (ASPITRG) Introduction to Information Geometry 3 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
4/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Basic concepts in differential geometry
Basic concepts
Manifold and SubmanifoldTangent vector, Tangent space and Vector field
Riemannian metric and Affine connection
Flatness and autoparallel
Yunshu Liu (ASPITRG) Introduction to Information Geometry 4 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
5/79
I t d ti t diff ti l t G t i t t f t ti ti l d l d t ti ti l i f
-
8/11/2019 Yunshu_InformationGeometry
6/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Manifold
Manifold S
Definition: LetSbe a set, if there exists a set of coordinate systems A forSwhich satisfies the condition (1) and (2) below, we callSan n-dimensional
C differentiable manifold.
(1) Each element ofA is a one-to-one mapping from Sto some opensubset ofRn.
(2) For all A, given any one-to-one mapping fromSto Rn, thefollowing hold:
A 1
is aC
diffeomorphism.
Here, by aC diffeomorphism we mean that 1 and its inverse 1are bothC(infinitely many times differentiable).
Yunshu Liu (ASPITRG) Introduction to Information Geometry 6 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
7/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Examples of Manifold
Examples of one-dimensional manifold
A straight line: a manifold in R1, even if it is given in Rk for k 2.Any open subset of a straight line: one-dimensional manifoldA closed subset of a straight line: not a manifold
A circle: locally the circle looks like a line
Any open subset of a circle: one-dimensional manifold
A closed subset of a circle: not a manifold.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 7 / 79
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
8/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
-
8/11/2019 Yunshu_InformationGeometry
9/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Examples of Manifold: surface of a torus
The torus in R3(surface of a doughnut):
x(u, v) = ((a+b cos u)cos v, (a+b cos u)sin v, b sin u), 0 u, v< 2.wherea is the distance from the center of the tube to the center of the torus,
andb is the radius of the tube. A torus is a closed surface defined as product
of two circles: T2 =S1 S1.n-torus: T
n
is defined as a product ofn circles: T
n
= S
1
S1
S1
.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 9 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
10/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
-
8/11/2019 Yunshu_InformationGeometry
11/79
a g y a a a a a
Coordinate systems for manifold
Parametrization of unit hemisphere
Parametrization: map of unit hemisphere intoR2
(2) by stereographic projections.
x(u, v) = ( 2u
1+u2 +v2,
2v
1+u2 +v2,1 u2 v21+u2 +v2
) where u2 +v2 1 (2)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 11 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
12/79
g y
Examples of Manifold: colors
Parametrization of color models
3 channel color models: RGB, CMYK, LAB, HSV and so on.
The RGB color model: an additive color model in which red, green, and
blue light are added together in various ways to reproduce a broad array
of colors.
The Lab color model: three coordinates of Lab represent the lightness ofthe color(L), its position between red and green(a) and its position
between yellow and blue(b).
Yunshu Liu (ASPITRG) Introduction to Information Geometry 12 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
13/79
Coordinate systems for manifold
Parametrization of color modelsParametrization: map of color into R3
Examples:
Yunshu Liu (ASPITRG) Introduction to Information Geometry 13 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
14/79
Submanifolds
Submanifolds
Definition: a submanifold M of a manifold S is a subset of S which itself hasthe structure of a manifold
An open subset of n-dimensional manifold forms an n-dimensional
submanifold.
One way to construct m(
-
8/11/2019 Yunshu_InformationGeometry
15/79
Submanifolds
Examples: color models
3-dimensional submanifold: any open subset;
2-dimensional submanifold: fix one coordinate;
1-dimensional submanifold: fix two coordinates.
Note: In Lab color model, we set a and b to 0 and change L from 0 to100(from black to white), then we get a 1-dimensional submanifold.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 15 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
16/79
Basic concepts in differential geometry
Basic concepts
Manifold and SubmanifoldTangent vector, Tangent space and Vector field
Riemannian metric and Affine connection
Flatness and autoparallel
Yunshu Liu (ASPITRG) Introduction to Information Geometry 16 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
17/79
Curves and Tangent vector of Curves
Curves
Curve: ISfrom some interval I( R)to S.Examples: curve on sphere, set of probability distribution, set of linear
systems.
Using coordinate system {i} to express the point(t)on the curve(where t I): i(t) =i((t)), then we get (t) = [1(t), , n(t)].
CCurves
C: infinitely many times differentiable(sufficiently smooth).If(t)is C fortI, we callaC on manifold S.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 17 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
18/79
Tangent vector of Curves
A tangent vector is a vector that is tangent to a curve or surface at a given
point.When S is an open subset ofRn, the range ofis contained within a singlelinear space, hence we consider the standard derivative:
(a) = limh0
(a+h)
(a)
h(3)
In general, however, this is not true, ex: the range ofin a color modelThus we use a more general derivative instead:
(a) =
ni=1
i(a)( i
)p (4)
wherei(t) =i (t), i(a) = ddt
i(t)|t=aand( i )p is anoperatorwhichmapsf
( f
i)p
for given functionf :S
R.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 18 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
19/79
Tangent space
Tangent space
Tangent space atp: ahyperplane Tpcontaining all the tangents of curvespassing through the pointp S. (dim Tp(S)= dimS)
Tp(S) ={n
i=1
ci(
i)p|[c1, , cn] Rn}
Examples: hemisphere and color
Yunshu Liu (ASPITRG) Introduction to Information Geometry 19 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
20/79
Vector fields
Vector fields
Vector fields: a map from each point in a manifold S to a tangent vector.
Consider a coordinate system {i} for a n-dimensional manifold, clearlyi =
i
are vector fields for i =1, , n.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 20 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
21/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
-
8/11/2019 Yunshu_InformationGeometry
22/79
Riemannian Metrics
Riemannian Metrics: an inner product of two tangent vectors(Dand D
Tp(S)) which satisfy D,D p R, and the following condition hold:
Linearity: aD+bD
,D
p =aD,D
p+ bD
,D
pSymmetry: D,Dp =D ,DpPositive definiteness: If D=0then D,Dp >0
The components
{gij
}of a Riemannian metric g w.r.t.the coordinate system
{i} are defined bygij= i, j, wherei = i .
Yunshu Liu (ASPITRG) Introduction to Information Geometry 22 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
23/79
Riemannian Metrics
Examples of inner product:
ForX= (x1, ,xn)and Y= (y1, ,yn), we can define inner productas X, Y1=X Y=
n
i=1xiyi, or X, Y2 =YMX, where M is anysymmetry positive-definite matrix.
For random variables X and Y, the expected value of their product:
X, Y= E(XY)For square real matrix,
A,B
= tr(ABT)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 23 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
24/79
Riemannian Metrics
For unit sphere:
x(, ) = (sin cos , sin sin, cos), 0< < , 0
-
8/11/2019 Yunshu_InformationGeometry
25/79
Affine connection
Parallel translation along curves
Let: [a, b]Sbe a curve inS,X(t)be a vector field mapping each point(t)to a tangent vector, if for all t[a, b]and the corresponding infinitesimaldt, the corresponding tangent vectors are linearly related, that is to say there
exist a linear mappingp,p , such thatX(t+dt)=p,p (X(t))for t[a, b],we say X isparallel along, and call theparallel translation along.Linear mapping: additivity and scalar multiplication.
Figure :Translation of a tangent vectoralonga curveYunshu Liu (ASPITRG) Introduction to Information Geometry 25 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
26/79
Affine connection
Affine connection: relationships between tangent space at different points.Recall:
Natural basis of the coordinate system[i]:(i)p=( i
)p: an operator
which mapsf ( f i
)pfor given functionf :SRat p.
Tangent space:
Tp(S) ={n
i=1
ci(
i)p|[c1, , cn] Rn}
Tangent vector(elements in Tangent space) can be represented as linearcombinations ofi.
Tangent spaceTp Tangent vectorXp Natural basis(i)p=( i )p
Yunshu Liu (ASPITRG) Introduction to Information Geometry 26 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
27/79
Affine connection
If the difference between the coordinates of p andp
are very small, that wecan ignore the second-order infinitesimals(di)(dj), wheredi =i(p
) i(p), then we can express difference betweenp,p
((j)p)and
((j)p )as a linear combination of{d1, , dn}:
p,p ((j)p) = (j)p
i,k
(di(kij)p(k)p ) (7)
where {(kij)p; i,j, k=1, , n} aren3 numbers which depend on the point p.FromX(t) =
ni=1X
i
(t)(i)p andX(t+dt) =n
i=1(Xi
(t+dt)(i)p
), wehavep,p (X(t)) =
i,j,k
({Xk(t) dti(t)Xj(t)(kij)p}(k)p ) (8)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 27 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
28/79
Affine connection
p,p
((j)p
) = (j)p
i,k(di(k
ij)p
(k)p )
Yunshu Liu (ASPITRG) Introduction to Information Geometry 28 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
29/79
Connection coefficients(Christoffels symbols):(kij)p
Given a connection on the manifold S, the value of(kij)pare different fordifferent coordinate systems, it shows how tangent vectors changes on a
manifold, thus shows how basis vectors changes.
Inp,p ((j)p) = (j)p
i,k
(di(kij)p(k)p )
if we letkij =0 fori,j, k=x,y, we will have
p,p ((j)p) = (j)p
Yunshu Liu (ASPITRG) Introduction to Information Geometry 29 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
k
http://find/ -
8/11/2019 Yunshu_InformationGeometry
30/79
Connection coefficients(Christoffels symbols):(kij)p
Given a connection on a manifold S,ki,jdepend on coordinate system. Define
a connection which makeski,jto be zero in one coordinate system, we willget non-zero connection coefficients in some other coordinate systems.
Example: If it is desired to let the connection coefficients for Cartesian
Coordinates of a 2D flat plane to be zero, kij
=0 fori,j, k=x,y, we can
calculate the connection coefficients for Polar Coordinates: r= r=
1r
,
r=r, andkij=0 for all others.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 30 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
k
http://find/ -
8/11/2019 Yunshu_InformationGeometry
31/79
Connection coefficients(Christoffels symbols):(kij)p
Example(cont.):
Now if we want to let the connection coefficients for Polar Coordinates to be
zero,kij=0 fori,j, k=r, , we can calculate the connection coefficients for
Polar Coordinates:xxx = sin2 cos
r ,yxx=
sin(1+cos2 )r
,
xxy
= xyx
= sin3
r ,y
xy= y
yx= cos
3
r ,x
yy= cos(1+sin
2 )
r , and
yyy= sin cos2
r .
Yunshu Liu (ASPITRG) Introduction to Information Geometry 31 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
32/79
Affine connection
Covariant derivative along curves
Derivative: dX(t)
dt =limdt0
X(t+dt)X(t)dt
, what ifX(t)and X(t+dt)lie indifferent tangent spaces? Xt(t+dt) = (t+dt),(t)(X(t+dt))
X(t) = Xt(t+dt) X(t) = (t+dt),(t)(X(t+dt)) X(t)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 32 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Affi i
http://find/ -
8/11/2019 Yunshu_InformationGeometry
33/79
Affine connection
Covariant derivative along curves
We call X(t)
dt the covariant derivative ofX(t):
X(t)dt
= limdt0
Xt(t+dt) X(t)dt
= (t+dt),(t)(X(t+dt)) X(t)dt
(9)
(t+dt)(X(t+dt)) =i,j,k
({Xk(t+dt) +dti(t)Xj(t)(kij)(t)}(k)(t))(10)
X(t)dt
=i,j,k
({ Xk(t) + i(t)Xj(t)(kij)(t)}(k)(t)) (11)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 33 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Affi ti
http://find/ -
8/11/2019 Yunshu_InformationGeometry
34/79
Affine connection
Covariant derivative of any two tangent vector
Covariant derivative ofYw.r.t. X, whereX=n
i=1(Xii)and
Y=n
i=1(Yii):
XY= i,j,k
(Xi{iYk +Yjkij}k) (12)
i j =n
k=1
kijk (13)
Note:(XY)p =Xp Y Tp(S)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 34 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
E l f Affi ti
http://find/ -
8/11/2019 Yunshu_InformationGeometry
35/79
Examples of Affine connection
metric connection
Definition: If for all vector fieldsX, Y,Z T(S),
ZX, Y=ZX, Y + X, ZY.
whereZ
X, Y
denotes the derivative of the function
X, Y
along this vector
field Z, we say that is a metric connection w.r.t. g.Equivalent condition: for all basisi, j, k T(S),
ki, j=ki, j + i, ki.
Property: parallel translation on a metric connection preserves inner products,
which means parallel transport is an isometry.
(D1), (D2)q =D1,D2p.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 35 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Examples of Affine connection
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
36/79
Examples of Affine connection
Levi-Civita connection
For a given connection, whenkij= k
jihold for all i, j and k, we call it a
symmetric connection or torsion-free connection.
From i j=n
k=1kijk, we know for a symmetric connection:
i j =j iIf a connection is both metric and symmetric, we call it the Riemannian
connection or the Levi-Civita connection w.r.t. g.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 36 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Basic concepts in differential geometry
http://find/ -
8/11/2019 Yunshu_InformationGeometry
37/79
Basic concepts in differential geometry
Basic concepts
Manifold and Submanifold
Tangent vector, Tangent space and Vector field
Riemannian metric and Affine connection
Flatness and autoparallel
Yunshu Liu (ASPITRG) Introduction to Information Geometry 37 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Flatness
http://find/ -
8/11/2019 Yunshu_InformationGeometry
38/79
Flatness
Affine coordinate systemLet {i} be a coordinate system forS, we call {i} an affine coordinate systemfor the connection if then basis vector fieldsi = i are all parallel onS.Equivalent conditions for a coordinate system to be an affine coordinate
system:
i j=n
k=1(kijk) =0 for all i and j (14)
kij =0 for all i,j and k (15)
FlatnessS is flatw.r.t the connection : an affine coordinate system exist for theconnection .
Yunshu Liu (ASPITRG) Introduction to Information Geometry 38 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Flatness
http://find/ -
8/11/2019 Yunshu_InformationGeometry
39/79
Flatness
Examples:
i j=n
k=1(kijk) =0 for all i and j
kij =0 for all i,j and k
Yunshu Liu (ASPITRG) Introduction to Information Geometry 39 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Flatness
http://find/ -
8/11/2019 Yunshu_InformationGeometry
40/79
Flatness
Curvature R and torsion T of a connection
R(i, j)k=
l
(Rlijkl)and T(i, j) =
k
(Tkijk) (16)
whereR
l
ijkandT
k
ijcan be computed in the following way:
Rlijk=il
jk jlik+ lihhjk ljhhik (17)Tkij =
kij kji (18)
If a connection is flat, then T=R=0;If T= 0,kij=0 for all i,j and k, we get the symmetry connnection.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 40 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Flatness
http://find/ -
8/11/2019 Yunshu_InformationGeometry
41/79
Flatness
Curvature
CurvatureR = 0 iff parallel translation does not depend on curve choice.Curvature is independent of coordinate system, under Riemannian
connection, we can calculate:
Curvature of 2 dimensional plane: R= 0;
Curvature of 3 dimensional sphere: R=
2
r2 .
Yunshu Liu (ASPITRG) Introduction to Information Geometry 41 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
42/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Autoparallel submanifold
-
8/11/2019 Yunshu_InformationGeometry
43/79
Autoparallel submanifold
GeodesicsGeodesics(autoparallel curves): A curve with tangent vector transported by
parallel translation.
Examples under Riemannian connection:
2 dimensional flat plane: straight line
3 dimensional sphere: great circle
Yunshu Liu (ASPITRG) Introduction to Information Geometry 43 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Autoparallel submanifold
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
44/79
p
Geodesics
The geodesics with respect to the Riemannian connection are known tocoincide with the shortest curve joining two points.
Shortest curve: curve with the shortest length.
Length of a curve : [a, b]S:
=
b
a
ddt
dt=
b
a
gijijdt (22)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 44 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Part II
http://find/ -
8/11/2019 Yunshu_InformationGeometry
45/79
Geometric structure of statistical models and statistical
inference
Yunshu Liu (ASPITRG) Introduction to Information Geometry 45 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Motivation
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
46/79
Motivation
Consider the set of probability distributions as a manifold.
Analysis the relationship between the geometric structure of the manifold andstatistical estimation.
Introduce concepts like metric, affine connection on statistical models and
studying quantities such as distance, the tangent space (which provides linear
approximations), geodesics and the curvature of a manifold.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 46 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Statistical models
http://find/ -
8/11/2019 Yunshu_InformationGeometry
47/79
Statistical models
P(X) ={p:X R | p(x)> 0(x X),
p(x)dx= 1} (23)
Example Normal Distribution:
X = R, n= 2, = [, ], ={[, ]| <
-
8/11/2019 Yunshu_InformationGeometry
48/79
Basic concepts
The Fisher metric and-connectionExponential familyDivergence and Geometric statistical inference
Yunshu Liu (ASPITRG) Introduction to Information Geometry 48 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
The Fisher information matrix
http://find/ -
8/11/2019 Yunshu_InformationGeometry
49/79
Fisher information matrixG() = [gi,j()], and
gi,j() =E[ij] =
i(x; )j(x; )p(x; )dx
where =(x; ) =log p(x; )and E denotes the expectation w.r.t. thedistributionp .
Motivation:
Sufficient statistic and Cramer-Rao bound
Yunshu Liu (ASPITRG) Introduction to Information Geometry 49 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
The Fisher information matrix
http://find/ -
8/11/2019 Yunshu_InformationGeometry
50/79
Sufficient statistic
Sufficient statistic: forY=F(X), given the distributionp(x; )ofX, we havep(x; ) = q(F(x); )r(x; ), ifr(x; )does not depend onfor allx, we saythatFis a sufficient statistic for the modelS. Then we can write
p(x; ) = q(y; )r(x).A sufficient statistic is a function whose value contains all the information
needed to compute any estimate of the parameter (e.g. a maximum likelihood
estimate).
Fisher information matrix and sufficient statistic
LetG()be the Fisher information matrix ofS=p(x; ), andGF()be theFisher information matrix of the induced modelSF=q(y; ), then we haveGF() G()in the sense thatG() =GF() G()is positivesemidefinite.G() =0 iff. F is a sufficient statistic for S.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 50 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Cramer-Rao inequality
http://find/ -
8/11/2019 Yunshu_InformationGeometry
51/79
Cramer-Rao inequality
The variance of any unbiased estimator is at least as high as the inverse of the
Fisher information.
Unbiased estimator : E[(X)] =
The variance-covariance matrixV[] = [vij ]where
vij =E[(
i(X) i)(j(X) j)]
Thus Cramer-Rao inequality state thatV
[] G()1, and an unbiased
estimator satisfyingV[] =G()1 is called an efficient estimator.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 51 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
-connection
http://find/ -
8/11/2019 Yunshu_InformationGeometry
52/79
-connection
LetS={p} be an n-dimensional model, and consider the function()ij,kwhich maps each pointto the following value:
(()ij,k) =E[(ij+
1
2 ij)(k)] (24)
where is an arbitrary real number. We defined an affine connection ()which satisfy:
()
i j, k
=
()
ij,k (25)
whereg =, is the Fisher metric. We call () the-connection
Yunshu Liu (ASPITRG) Introduction to Information Geometry 52 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
-connection
http://find/ -
8/11/2019 Yunshu_InformationGeometry
53/79
Properties of-connection
-connection is a symmetric connection
Relationship between-connection and-connection:
()ij,k =
()ij,k +
2
E[ijk]
The 0-connection is the Riemannian connection with respect to the
Fisher metric.
()
ij,k =
(0)
ij,k+
2 E[ijk]
() = 1+2
(1) + 1 2
(1)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 53 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Geometric structure of statistical models and statistical inference
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
54/79
Basic concepts
The Fisher metric and-connectionExponential family
Divergence and Geometric statistical inference
Yunshu Liu (ASPITRG) Introduction to Information Geometry 54 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Exponential family
http://find/ -
8/11/2019 Yunshu_InformationGeometry
55/79
Exponential family
p(x; ) =exp[C(x) +
ni=1
iFi(x) ()]
[i]are called the natural parameters(coordinates), and is the potential
function for[i], which can be calculated as
() =log
exp[C(x) +
ni=1
iFi(x)]dx
The exponential families include many of the most common distributions,including the normal, exponential, gamma, beta, Dirichlet, Bernoulli,
binomial, multinomial, Poisson, and so on.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 55 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Exponential family
http://find/ -
8/11/2019 Yunshu_InformationGeometry
56/79
Exponential family
Examples: Normal Distribution
p(x; , ) = 1
2e
(x)2
22 (26)
whereC(x) =0, F1(x) =x, F2(x) =x2, and1 =
2, 2 = 1
22are the
natural parameters, the potential function is :
=(1
)2
42 +12
log( 2 ) = 2
22 +log(2) (27)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 56 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Mixture family
http://find/ -
8/11/2019 Yunshu_InformationGeometry
57/79
Mixture family
p(x; ) =C(x) +n
i=1
iFi(x)
In this case we say thatSis a mixture family and[i]are called the mixtureparameters.
e-connection and m-connection
The natural parameters of exponential family form a 1-affine coordinate
system((1)ij,k=0), which means the connection is 1-flat, we call the
connection (1) the e-connection, and call exponential family e-flat.The mixture parameters of mixture family form a (-1)-affine coordinate
system((1)ij,k =0), which means the connection is (-1)-flat, and we call the
connection (1) the m-connection and call mixture family m-flat.Yunshu Liu (ASPITRG) Introduction to Information Geometry 57 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Dual connection
http://find/ -
8/11/2019 Yunshu_InformationGeometry
58/79
Dual connection
Definition: Let S be a manifold on which there is given a Riemannian metric
g and two affine connection and . If for all vector fieldsX, Y,Z T(S),
Z=+ < X, ZY> (28)
hold, we say that and are duals of each other w.r.t. g and call one thedual connection of the other.
Additional, we call the triple (g,
,) a dualistic structure on S.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 58 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Dual connection
http://find/ -
8/11/2019 Yunshu_InformationGeometry
59/79
Properties
For any statistical model, the-connection and the ()-connection aredual with respect to the Fisher metric.
(D1), (D2)q=D1,D2p.whereand
are parallel translation alongw.r.t. and .
R= 0
R =0
whereR and R are the curvature tensors of and .
Yunshu Liu (ASPITRG) Introduction to Information Geometry 59 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Dually flat spaces and dual coordinate system
http://goforward/http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
60/79
Dually flat spaces
Let(g, , )be a dualistic structure on a manifoldS, then we haveR=0 R =0, and if the connnection and are bothsymmetric(T=T =0), then we see that -flatness and -flatness areequivalent.We call(S, g, , )a dually flat space if both duals and are flat.Examples: Since-connections and -connections are dual w.r.t. Fishermetric and-connections are symmetry, we have for any statistical model Sand for any real number
S is flat S is() flat (29)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 60 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Dually flat spaces and dual coordinate system
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
61/79
Dual coordinate system
For a particular -affine coordinate system[i], if we choose a corresponding-affine coordinate system[j]such that
g=i, j=jiwherei =
i
andj = j
.
Then we say the two coordinate systems mutually dual w.r.t. metric g, and
call one the dual coordinate system of the other.
Existence of dual coordinate system
A pair of dual coordinate system exist if and only if(S, g, , )is a duallyflat space.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 61 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Legendre transformations
http://find/ -
8/11/2019 Yunshu_InformationGeometry
62/79
Consider mutually dual coordinate system[i]and [i]with functions:S R and : S R satisfy the following equations:
i=i
i= i
gi,j =ij =ji =ij
() =max{ii ()}
() =max
{i
i ()}
Yunshu Liu (ASPITRG) Introduction to Information Geometry 62 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Legendre transformations
http://find/ -
8/11/2019 Yunshu_InformationGeometry
63/79
Geometric interpretation forf(p) =maxx(px f(x)):A convex functionf(x)is shown in red, andthe tangent line at point(x0,f(x0))is shownin blue. The tangent line intersects the
vertical axis at(0, f)and f is the value ofthe Legendre transformf(p0), wherep0 =f(x0). Note that for any other point onthe red curve, a line drawn through that point
with the same slope as the blue line will have
a y-intercept above the point(0, f),showing that is indeed a maximum.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 63 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Examples of Legendre transformations
http://find/ -
8/11/2019 Yunshu_InformationGeometry
64/79
Examples:
The Legendre transform off(x) = 1p|x|p (where 1
-
8/11/2019 Yunshu_InformationGeometry
65/79
For distributionp(x; ) =exp[C(x) +n
i=1iFi(x) ()],[i]are
called the natural parameters
If we definei =E[Fi] =
Fi(x)p(x; )dx, we can verify[i]is a(-1)-affine coordinate system dual to[i], we call this[i]the expectationparameters or the dual parameters.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 65 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
The natural parameter and dual parameter of Exponential family
http://find/ -
8/11/2019 Yunshu_InformationGeometry
66/79
Recall: Normal Distribution
p(x; , ) = 12
e (x)2
22 (30)
whereC(x) =0, F1(x) =x, F2(x) =x2, and1 =
2, 2 = 1
22are the
natural parameters, the potential function is :
=(1)2
42 +
1
2log(
2) =
2
22+log(
2) (31)
The dual parameter are calculated as1= 1
= =
1
22,
2 = 2
=2 +2 = (1)2224(2)2
, It has potential function:
=12
(1+log( 2
)) =12
(1+log(2)) +2log) (32)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 66 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Geometric structure of statistical models and statistical inference
http://find/ -
8/11/2019 Yunshu_InformationGeometry
67/79
Basic concepts
The Fisher metric and-connectionExponential family
Divergence and Geometric statistical inference
Yunshu Liu (ASPITRG) Introduction to Information Geometry 67 / 79
http://find/ -
8/11/2019 Yunshu_InformationGeometry
68/79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Divergence, semimetrics and metrics
-
8/11/2019 Yunshu_InformationGeometry
69/79
A distance satisfying positive-definiteness, symmetry and triangle
inequality is called ametric;
A distance satisfying positive-definiteness and symmetry is calledsemimetrics;
A distance satisfying only positive-definiteness is called adivergence.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 69 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Kullback-Leibler divergence
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
70/79
Discrete random variables p and q:
DKL(pq) =
i
p(x)logp(x)q(x)
(34)
Continuous random variables p and q:
DKL(pq) =
p(x)logp(x)q(x)
dx (35)
Generally , we use Kullback-Leibler divergence to measurethe difference
between two probability distributions p and q. KL measures the expected
number of extra bits required to code samples from p when using a codebased on q, rather than using a code based on p. Typically p represents the
true distribution of data, observations, or a precisely calculated theoretical
distribution. The measure q typically represents a theory, model, description,
or approximation of p.Yunshu Liu (ASPITRG) Introduction to Information Geometry 70 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Bregman divergence
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
71/79
Bregman divergence associatedwithFfor pointsp, q is :BF(xy) =F(y) F(x) (y x), F(x),where F(x) is a convex function
defined on a closed convex set
.
Examples:F(x) =x2, thenBF(xy) =x y2.More generally, ifF(x) = 1
2xTAx, thenBF(xy) = 12 (x y)TA(x y).
KL divergence:ifF=
ix logx
x, we get KullbackLeibler divergence.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 71 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
http://find/ -
8/11/2019 Yunshu_InformationGeometry
72/79
Canonical divergence(a divergence for dually flat space)
Let(S, g, , )be a dually flat space, and {[i], [j]} be mutually dual affinecoordinate systems with potentials {, }, then the canonicaldivergence((g, ) divergence) is defined as:
D(pq) =(p) +(q) i(p)j(q) (36)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 72 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
73/79
Properties:
Relation between(g, ) divergenceand(g, ) divergence:D(pq) =D(qp)If M is a autoparallel submanifold w.r.t. either or , then the(gM,
M)-divergenceDM= D
|MMis given byDM(p
q) =D(p
q)
If is a Riemannian connection(=) which is flat on S, there exista coordinate system which is self-dual(i =i), then== 1
2
i(
i)2, then the canonical divergence is
D(pq) = 1
2{d(p, q)}2
whered(p, q) =
i{i(p) i(q)}2
Yunshu Liu (ASPITRG) Introduction to Information Geometry 73 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
http://find/http://goback/ -
8/11/2019 Yunshu_InformationGeometry
74/79
Triangular relation
Let {[i], [i]} be mutually dual affine coordinate systems of a dually flatspace(S, g,
,
), and let D be a divergence on S. Then a necessary and
sufficient condition for D to be the(g, )-divergence is that for all p, q, rSthe following triangular relation holds:
D(pq) +D(qr) D(pr) ={i(p) i(q)}{i(p) i(q)} (37)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 74 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
Pythagorean relation
http://find/ -
8/11/2019 Yunshu_InformationGeometry
75/79
Pythagorean relation
Let p, q, and r be three points in S. Let 1 be the
-geodesic connecting p and
q, and let2be the -geodesic connecting q and r. If at the intersection q thecurve1 and2are orthogonal(with respect to the inner product g), then wehave the following Pythagorean relation.
D(p
r) =
D(p
q) +
D(
q
r)
(38)
Yunshu Liu (ASPITRG) Introduction to Information Geometry 75 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
Projection theorem
http://find/ -
8/11/2019 Yunshu_InformationGeometry
76/79
Projection theorem
Let p be a point in S and let M be a submanifold of S which is
-autoparallel. Then a necessary and sufficient condition for a point q in Mto satisfy
D(pq) =minrMD(pr) (39)is for the
-geodesic connecting p and q to be orthogonal to M at q.
Yunshu Liu (ASPITRG) Introduction to Information Geometry 76 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
http://find/ -
8/11/2019 Yunshu_InformationGeometry
77/79
Examples
From the definition of exponential family and mixture family, the product of
exponential family are still exponential family, the sum of mixture family are
still mixture family.
e-flat submanifold: set of all product distributions:
E0 ={pX|pX(x1, ,xN) =N
i=1pXi (xi)}m-flat submanifold: set of joint distributions with given marginals:
M0 =
{pX
|X\ipX(x) =qi(xi)
i
{1,
,N
}}
Yunshu Liu (ASPITRG) Introduction to Information Geometry 77 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Canonical divergence
http://find/ -
8/11/2019 Yunshu_InformationGeometry
78/79
Examples
Yunshu Liu (ASPITRG) Introduction to Information Geometry 78 / 79
Introduction to differential geometry Geometric structure of statistical models and statistical inference
Thanks!
http://find/ -
8/11/2019 Yunshu_InformationGeometry
79/79
Thanks!
Question?
Yunshu Liu (ASPITRG) Introduction to Information Geometry 79 / 79
http://find/