introduction to dca - github
TRANSCRIPT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction To DCA
K.K M
December 12, 2016
K.K M (HUST) Introduction To DCA December 12, 2016 1 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
Contents
• Distance And Correlation• Maximum-Entropy Probability Model• Maximum-Likelihood Inference• Pairwise Interaction Scoring Function• Summary
K.K M (HUST) Introduction To DCA December 12, 2016 2 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Definition of Distance
Definition of Distance
A statistical distance quantifies the distance between two statisticalobjects, which can be two random variables, or two probabilitydistributions or samples, or the distance can be between an individualsample point and a population or a wider sample of points.
• d(x , x) = 0 Identity of indiscernibles• d(x , y) ≥ 0 Non negative• d(x , y) = d(y , x) Symmetry• d(x , k) + d(k, y) ≥ d(x , y) Triangle inequality
K.K M (HUST) Introduction To DCA December 12, 2016 3 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Different Distances
Different Distances
• Minkowski distance• Euclidean distance• Manhattan distance• Chebyshev distance• Mahalanobis distance• Cosine similarity
• Pearson correlation• Hamming distance• Jaccard similarity• Levenshtein distance• DTW distance• KL-Divergence
K.K M (HUST) Introduction To DCA December 12, 2016 4 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Minkowski Distance
Minkowski Distance
P = (x1, x2, ..., xn) and Q = (y1, y2, ..., yn) ∈ Rn
• Minkowski distance: (n∑
i=1|xi − yi |p)
1p
• Euclidean distance: p = 2• Manhattan distance: p = 1• Chebyshev distance: p = ∞
limp→∞
(n∑
i=1|xi − yi |p)
1p = nmax
i=1|xi − yi |
• z-transform: (xi , yi) 7→ ( xi −µxσx
,yi −µy
σy), xi and yi is independent
K.K M (HUST) Introduction To DCA December 12, 2016 5 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Mahalanobis Distance
Mahalanobis Distance
• Mahalanobis Distance
Covariance matrix C = LLT
Transformation x L−1(x−µ)−−−−−−→ x ′
K.K M (HUST) Introduction To DCA December 12, 2016 6 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Mahalanobis Distance
Example for the Mahalanobis distance
Figure: Euclidean distance Figure: Mahalanobis distance
K.K M (HUST) Introduction To DCA December 12, 2016 7 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Pearson Correlation
Pearson Correlation
• Inner Product: Inner(x , y) = ⟨x , y⟩ =∑
ixiyi
• Cosine similarity: CosSim(x , y) =∑
i xi yi√∑i x2
i
√∑i y2
i= ⟨x ,y⟩
∥x∥ ∥y∥
• Pearson correlation
Corr(x , y) =∑
i(xi − x)(yi − y)√∑(xi − x)2
√∑(yi − y)2
= ⟨x − x , y − y⟩∥x − x∥ ∥y − y∥
= CosSim(x − x , y − y)
K.K M (HUST) Introduction To DCA December 12, 2016 8 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Pearson Correlation
Different Values for the Pearson Correlation
K.K M (HUST) Introduction To DCA December 12, 2016 9 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Limits of Pearson Correlation
Limits of Pearson Correlation
Q: Why don’t use Pearson correlation in pairwise associations?
A: Pearson correlation is a misleading measure for direct dependence asit only reflects the association between two variables while ignoring theinfluence of the remaining ones.
K.K M (HUST) Introduction To DCA December 12, 2016 10 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Partial Correlation
Partial Correlation
For M given samples in L measured variables:
x1 = (x11 , . . . , x1
L)T , . . . , xM = (xM1 , . . . , xM
L ) ∈ RL
Pearson correlation coefficient:
rij = Cij√Cii C jj
where Cij = 1M∑M
m=1(xmi − xi)(xm
j − xj) is empirical covariance matrix.Scale ecah of variables to zero-mean and unit-standard deviation,
xi 7→ (xi − xi)√Cii
Simplify the correlation coefficient: rij ≡ xixj
K.K M (HUST) Introduction To DCA December 12, 2016 11 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Partial Correlation
Partial Correlation of Three-variable System
The partial correlation between X and Y given a set of n controllingvariables Z = Z1, Z2, . . . , Zn, written rXY ·Z , is
rXY ·Z = N∑N
i=1 rX ,irY ,i −∑N
i=1 rX ,i∑N
i=1 rY ,i√N∑N
i=1 r2X ,i − (
∑Ni=1 rX ,i)2
√N∑N
i=1 r2Y ,i − (
∑Ni=1 rY ,i)2
In particularly, where Z is a single variable, which of a three randomvariables between xA and xB given xC is defined as
rAB·C = rAB − rBCrAC√1 − r2
AC
√1 − r2
BC
≡ − (C−1)AB√(C−1)AA(C−1)BB
K.K M (HUST) Introduction To DCA December 12, 2016 12 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Distance And Correlation Pearson V.S. Partial Correlation
Reaction System Reconstruction
K.K M (HUST) Introduction To DCA December 12, 2016 13 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Entropy
Entropy
The thermodynamic definition of entropy was developed in the early1850s by Rudolf Clausius:
∆S =∫
δQrevT
In 1948, Shannon defined the entropy H of a discrete random variable Xwith possible values x1, . . . , xn and probability mass function p(x) as:
H(X ) = −∑
xp(x) log p(x)
K.K M (HUST) Introduction To DCA December 12, 2016 14 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Joint & Conditional Entropy
Joint & Conditional Entropy
• Joint Entropy: H(X , Y )• Conditional Entropy: H(Y |X )
H(X , Y ) − H(X ) = −∑x ,y
p(x , y) log p(x , y) +∑
xp(x) log p(x)
= −∑x ,y
p(x , y) log p(x , y) +∑
x
(∑y
p(x , y))
log p(x)
= −∑x ,y
p(x , y) log p(x , y) +∑x ,y
p(x , y) log p(x)
= −∑x ,y
p(x , y) log p(x , y)p(x)
= −∑x ,y
p(x , y) log p(y |x)
= H(Y |X )
K.K M (HUST) Introduction To DCA December 12, 2016 15 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Relative Entropy & Mutual Information
Relative Entropy & Mutual Information
• Relative Entropy(KL-Divergence): D(p∥q) =∑
xp(x) log p(x)
q(x)• D(p∥q) = D(q∥p)• D(p∥q) ≥ 0
• Mutual Information:
I(X , Y ) = D( p(x , y) ∥ p(x)p(y) ) =∑x ,y
p(x , y) log p(x , y)p(x)p(y)
K.K M (HUST) Introduction To DCA December 12, 2016 16 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Relative Entropy & Mutual Information
Mutual Information
H(Y ) − I(X , Y ) = −∑
yp(y) log p(y) −
∑x ,y
p(x , y) log p(x , y)p(x)p(y)
= −∑
y
(∑x
p(x , y))
log p(y) −∑x ,y
p(x , y) log p(x , y)p(x)p(y)
= −∑x ,y
p(x , y) log p(y) −∑x ,y
p(x , y) log p(x , y)p(x)p(y)
= −∑x ,y
p(x , y) log p(x , y)p(x)
= H(Y |X )
H(X , Y ) − H(X ) = H(Y |X ) ⇒ I(X , Y ) = H(X ) + H(Y ) − H(X , Y )
K.K M (HUST) Introduction To DCA December 12, 2016 17 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model MI & DI
MI & DI
• MIij =∑σ,ω
fij(σ, ω) ln(
fij(σ, ω)fi(σ)fj(ω)
)
• DIij =∑σ,ω
Pdirij (σ, ω) ln
(Pdir
ij (σ, ω)fi(σ)fj(ω)
)
where Pdirij (σ, ω) = 1
zijexp(eij(σ, ω) + hi(σ) + hj(ω))
K.K M (HUST) Introduction To DCA December 12, 2016 18 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Principle of Maximum Entropy
Principle of Maximum Entropy
• The principle was first expounded by E.T.Jaynes in two papers in1957 where he emphasized a natural correspondence betweenstatistical mechanics and information theory.
• The principle of maximum entropy states that, subject to preciselystated prior data, the probability distribution which best representsthe current state of knowledge is the one with largest entropy.
• maximize S = −∫
x P(x) ln P(x)dx
K.K M (HUST) Introduction To DCA December 12, 2016 19 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Principle of Maximum Entropy
Example
• Suppose we want to estimate a probability distribution p(a, b),where a ∈ x , y ans b ∈ 0, 1
• Furthermore the only fact known about p is thatp(x , 0) + p(y , 0) = 0.6
K.K M (HUST) Introduction To DCA December 12, 2016 20 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Principle of Maximum Entropy
Unbiased Principle
Figure: One way to satisfy constraints
Figure: The most uncertain way to satisfy constraints
K.K M (HUST) Introduction To DCA December 12, 2016 21 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Principle of Maximum Entropy
Maximum-Entropy Probability Model
• Continuous random variables: x = (x1, . . . , xL)T ∈ RL
• Constraints:•∫
x P(x)dx = 1
• ⟨xi⟩ =∫
xP(x)xidx = 1
M
M∑m=1
xmi = xi
• ⟨xixj⟩ =∫
xP(x)xixjdx = 1
M
M∑m=1
xmi xm
j = xixj
• Maximize: S = −∫
x P(x) ln P(x)dx
K.K M (HUST) Introduction To DCA December 12, 2016 22 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Lagrange Multipliers Method
Lagrange Multipliers Method
Convert a constrained optimization problem into an unconstrained oneby means of the Lagrangian L.
L = S + α(⟨1⟩ − 1) +L∑
i=1βi(⟨xi⟩ − xi) +
L∑i ,j=1
γij(⟨xixj⟩ − xixj)
δLδP(x) = 0 ⇒ − ln P(x) − 1 + α +
L∑i=1
βixi +L∑
i ,j=1γijxixj = 0
Pairwise maximum-entropy probability distribution
P(x; β, γ) = exp
−1 + α +L∑
i=1βixi +
L∑i ,j=1
γijxixj
= 1Z e−H(x ;β,γ)
K.K M (HUST) Introduction To DCA December 12, 2016 23 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Lagrange Multipliers Method
Lagrange Multipliers Method
• Partition function as normalization constant
Z (β, γ) :=∫
xexp
L∑i=1
βixi +L∑
i ,j=1γijxixj
dx ≡ exp(1 − α)
• Hamiltonian
H(x) := −L∑
i=1βixi −
L∑i ,j=1
γijxixj
K.K M (HUST) Introduction To DCA December 12, 2016 24 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Maximum-Entropy Probability Model
Categorical Random Variables
• jointly distributed categorical variables x = (x1, . . . , xL)T ∈ ΩL,each xi is defined on the finite set Ω = σ1, . . . , σq.
• In the concrete example of modeling protein co-evolution, this setcontains the 20 amino acids represented by a 20-letter alphabet plusone gap element.
Ω = A, C , D, E , F , G , H, I, K , L, M, N, P, Q, R, S, T , V , W , Y , −and q = 21
K.K M (HUST) Introduction To DCA December 12, 2016 25 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Maximum-Entropy Probability Model
Binary Embedding of Amino Acid Sequence
K.K M (HUST) Introduction To DCA December 12, 2016 26 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Maximum-Entropy Probability Model
Pairwise MEP Distribution on Categorical Variables
• single and pairwise marginal probability
⟨xi(σ)⟩ =∑x(σ)
P(x(σ))xi(σ) =∑
xP(xi = σ) = Pi(σ)
⟨xi(σ)xj(ω)⟩ =∑
x(σ) P(x(σ))xi(σ)xj(ω) =∑
x P(xi = σ, xj = ω) = Pij(σ, ω)
• single-site and pair frequency counts
xi(σ) = 1M
M∑m=1
xmi (σ) = fi(σ)
xi(σ)xj(ω) = 1M
M∑m=1
xmi (σ)xm
j (ω) = fij(σ, ω)
K.K M (HUST) Introduction To DCA December 12, 2016 27 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Maximum-Entropy Probability Model
Pairwise MEP Distribution on Categorical Variables
• Constraints ∑x
P(x) =∑x(σ)
P(x(σ)) = 1
Pi(σ) = fi(σ), Pij(σ, ω) = fij(σ, ω)
• Maximize S = −∑
xP(x) ln P(x) =
∑x(σ)
P(x(σ)) ln P(x(σ))
• Lagrangian
L = S + α(⟨1⟩ − 1) +L∑
i=1
∑σ∈Ω
βi(σ)(Pi(σ) − fi(σ))
+L∑
i ,j=1
∑σ,ω∈Ω
γij(σ, ω)(Pij(σ, ω) − fij(σ, ω))
K.K M (HUST) Introduction To DCA December 12, 2016 28 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Maximum-Entropy Probability Model
Pairwise MEP Distribution on Categorical Variables
Distribution
P(x(σ); β, γ)
= 1Z exp
L∑i=1
∑σ∈Ω
βi(σ)xi(σ) +L∑
i ,j=1
∑σ,ω∈Ω
γij(σ, ω)xi(σ)xj(ω)
Let
hi(σ) := βi(σ) + γii(σ, σ), eij(σ, ω) := 2γij(σ, ω)
Then
P(xi , . . . , xL) ≡ 1Z exp
L∑i=1
hi(xi) +∑
1≤i<j≤Leij(xi , xj)
K.K M (HUST) Introduction To DCA December 12, 2016 29 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Gauge Fixing
Gauge Fixing
L(q − 1) + L(L−1)2 (q − 1)2 independent constraints compared to
Lq + L(L−1)2 q2 free parameters to be estimated.
1 =∑σ∈Ω
Pi(σ), i = 1, . . . , L
Pi(σ) =∑ω∈Ω
Pij(σ, ω), i , j = 1, . . . , L
Two guage fixing ways• gap to zero: eij(σq, ·) = eij(·, σq) = 0, hi(σq) = 0• zero-sum guage:
∑σ eij(σ, ω) =
∑σ eij(ω′, σ) = 0,
∑σ hi(σ) = 0
K.K M (HUST) Introduction To DCA December 12, 2016 30 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables
Closed-Form Solution for Continuous Variables
rewirte the exponent of the pairwise maximum- entropy probabilitydistribution
P(x; β, γ) = 1Z exp
(βT x − 1
2xT γx)
= 1Z exp
(12βT γ−1β − 1
2(x − γ−1β)T γ(x − γ−1β))
where γ := −2γ, let z = (z1, . . . , zL)T := x − γ−1β, then
P(x) = 1Z
exp(
−12(x − γ−1β)T γ(x − γ−1β)
)≡ 1
Ze− 1
2 zT γz
normalization condition
1 =∫
xP(x)dx ≡ 1
Z
∫x
e− 12 zT γzdz
K.K M (HUST) Introduction To DCA December 12, 2016 31 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables
Closed-Form Solution for Continuous Variables
use the point symmetry of the integrand∫z
e− 12 zT γzzidz = 0
then
⟨xi⟩ =∫
x P(x)xidx ≡ 1Z∫
z e− 12 zT γz
(zi +
∑Lj=1(γ−1)ijβj
)dz =
∑Lj=1(γ−1)ijβj
⟨xixj⟩ =∫
x P(x)xixjdx ≡ 1Z∫
z e− 12 zT γz(zi − ⟨xi⟩)(zj − ⟨xj⟩)dz = ⟨zizj⟩ + ⟨xi⟩⟨xj⟩
soCij = ⟨xixj⟩ − ⟨xi⟩⟨xj⟩ ≡ ⟨zizj⟩
K.K M (HUST) Introduction To DCA December 12, 2016 32 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables
Closed-Form Solution for Continuous Variables
the term ⟨zizj⟩ is solved using a spectral decomposition
⟨zizj⟩ = 1Z
L∑l ,n=1
(vl)i(vn)j
∫y
exp(
−12
L∑k=1
λky2k
)ylyndy
=L∑
k=1
1λk
(vk)i(vk)j ≡ (γ−1)ij
with Cij = (γ−1)ij , the Lagrange multipliers β and γ are
β = C−1⟨x⟩, γ = −12 γ = −1
2C−1
the real-valued maximum-entropy distribution
P(x ; ⟨x⟩, C) = (2π)−L/2 det(C)−1/2 exp(
−12(x − ⟨x⟩)T C−1(x − ⟨x⟩)
)K.K M (HUST) Introduction To DCA December 12, 2016 33 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables
Pair Interaction Strength
The pair interaction strength is evaluated by the already introducedpartial correlation coefficient between xi and xj given the remainingvariables xr r∈1,...,L\i ,j.
ρij·1,...,L\i,j ≡ γij√γiiγjj
=
− (C−1)ij√(C−1)ii (C−1)jj
if i = j ,
1 if i = j .
K.K M (HUST) Introduction To DCA December 12, 2016 34 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Entropy Probability Model Solution for Categorical Variables
Mean-Field Approximation
the empirical covariance matrix
Cij(σ, ω) = fij(σ, ω) − fi(σ)fj(ω)
application of the closed-form solution for continuous variables to thecategorical variables for C−1(σ, ω) ≈ C−1(σ, ω) yields the so-calledmean-field (MF) approximation
γMFij (σ, ω) = −1
2(C−1)ij(σ, ω) ⇒ eMFij (σ, ω) = −(C−1)ij(σ, ω)
The same solution has been obtained by using a perturbation ansatz tosolve the q-state Potts model termed (mean-field) Direct CouplingAnalysis (DCA or mfDCA)
K.K M (HUST) Introduction To DCA December 12, 2016 35 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Maximum-Likelihood
What is the Maximum-Likelihood
• A well-known approach to estimate the parameters of a model ismaximum-likelihood inference.
• The likelihood is a scalar measure of how likely the modelparameters are, given the observed data, and themaximum-likelihood solution denotes the parameter set maximizingthe likelihood function.
K.K M (HUST) Introduction To DCA December 12, 2016 36 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Likelihood Function
Likelihood Function
To use the method of maximum likelihood, one first specifies the jointdensity function for all observations. For an independent and identicallydistributed sample, this joint density function is
f (x1, x2, . . . , xn|θ) = f (x1|θ) × f (x2|θ) × · · · × f (xn|θ)
whereas θ be the function’s variable and allowed to vary freely; thisfunction will be called the likelihood:
L(θ; x1, . . . , xn) = f (x1, x2, . . . , xn|θ) =n∏
i=1f (xi |θ)
log-likelihood
ln L(θ; x1, . . . , xn) =n∑
i=1ln f (xi |θ)
average log-likelihood ℓ = 1n lnL
K.K M (HUST) Introduction To DCA December 12, 2016 37 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Maximum-Likelihood Inference
Maximum-Likelihood InferenceFor a pairwise model with parameters h(σ) and e(σ, ω), the likelihoodfunction of given observed data x1, . . . , xM ∈ ΩL, which are assumed tobe independent and identically distributed as
l(h(σ), e(σ, ω)|x1, . . . , xM) =∏M
m=1 P(xm; h(σ), e(σ, ω))
hML(σ), eML(σ, ω) = arg maxh(σ),e(σ,ω)
l(h(σ), e(σ, ω))
= arg minh(σ),e(σ,ω)
− ln l(h(σ), e(σ, ω)
Maximum-entropy distribution as model distribution
ln l(h(σ), e(σ, ω)) =M∑
m=1ln P(xm; h(σ), e(σ, ω))
= −M
ln Z −L∑
i=1
∑σ
hi(σ)fi(σ) −∑
1≤i<j≤L
∑σ,ω
eij(σ, ω)fij(σ, ω)
K.K M (HUST) Introduction To DCA December 12, 2016 38 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Maximum-Likelihood Inference
Maximum-Likelihood Inference
Taking the derivatives
∂
∂hi(σ) ln l = −M[
∂
∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)
− fi(σ)]
= 0
∂
∂eij(σ, ω) ln l = −M[
∂
∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)
− fij(σ, ω)]
= 0
Partial derivatives of the partition function
∂
∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)
= 1Z ∂hi (σ) Z
∣∣∣∣h(σ),e(σ,ω)
= Pi(σ; h(σ), e(σ, ω))
∂
∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)
= 1Z ∂eij (σ,ω)Z
∣∣∣∣h(σ),e(σ,ω)
= Pij(σ, ω; h(σ), e(σ, ω))
K.K M (HUST) Introduction To DCA December 12, 2016 39 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Maximum-Likelihood Inference
Maximum-Likelihood Inference
The maximizing parameters, hML(σ) and eML(σ, ω) are those matchingthe distributions single and pair marginal probabilities with the empiricalsingle and pair frequency counts,
Pi(σ; hML(σ), eML(σ, ω)) = fi(σ), Pij(σ, ω; hML(σ), eML(σ, ω)) = fij(σ, ω)
In other words, matching the moments of the pairwise maximum-entropyprobability distribution to the given data is equivalent to maximum-likelihood fitting of an exponential family.
K.K M (HUST) Introduction To DCA December 12, 2016 40 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Pseudo-Likelihood Maximization
Pseudo-Likelihood Maximization
Besag introduced the pseudo-likelihood as approximation to thelikelihood function in which the global partition function is replaced bycomputationally tractable local estimates.
In this approach, the probability of the m-th observation, xm, isapproximated by the product of the conditional probabilities of xr = xm
rgiven observations in the remaining variablesx\r := (x1, . . . , xr−1, xr+1, . . . , xL)T ∈ ΩL−1
P(xm; h(σ), e(σ, ω)) ≃L∏
r=1P(xr = xm
r |x\r = xm\r ; h(σ), e(σ, ω))
K.K M (HUST) Introduction To DCA December 12, 2016 41 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum-Likelihood Inference Pseudo-Likelihood Maximization
Pseudo-Likelihood MaximizationEach factor is of the following analytical form,
P(xr = xmr |x\r = xm
\r ; h(σ), e(σ, ω)) =
exp
hr (xmr )+
∑j =r
erj(xmr , xm
j )
∑
σ
exp
hr (σ) +∑j =r
erj(σ, xmj )
By this approximation, the loglikelihood becomes thepseudo-loglikelihood,
ln lPL(h(σ), e(σ, ω)) :=M∑
m=1
L∑r=1
ln P(xr = xmr |x\r = xm
\r ; h(σ), e(σ, ω))
An ℓ2-regularizer is added to select for small absolute values of theinferred parameters,
hPLM(σ), ePLM(σ, ω) = arg minh(σ),e(σ,ω)
− ln lPL(h(σ), e(σ, ω))+
λh∥h(σ)∥22 + λe∥e(σ, ω)∥2
2K.K M (HUST) Introduction To DCA December 12, 2016 42 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Scoring Functions for Pairwise Interaction Strengths Scoring Functions
Scoring Functions for Pairwise Interaction Strengths
• Direct information: DIij =∑
σ,ω∈ΩPdir
ij (σ, ω) lnPdir
ij (σ, ω)fi(σ)fj(ω)
• Frobenius norm: ∥eij∥F = ∑
σ,ω∈Ωeij(σ, ω)2
1/2
• Average product correction: APC − FNij = ∥eij∥F − ∥ei·∥F ∥e·j ∥F∥e··∥F
where
∥ei ·∥F := 1L − 1
L∑j=1
∥eij∥F
∥e·j∥F := 1L − 1
L∑i=1
∥eij∥F
∥e··∥F := 1L(L − 1)
L∑i ,j=1
∥eij∥F
K.K M (HUST) Introduction To DCA December 12, 2016 43 / 44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
Summary
K.K M (HUST) Introduction To DCA December 12, 2016 44 / 44