machine learning techniques in image analysis
TRANSCRIPT
Semi-supervised learning
Learning from both labeled and unlabeled data
Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity
COMP 875 Machine learning techniques in image analysis
Semi-supervised learning
Learning from both labeled and unlabeled data
Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity
COMP 875 Machine learning techniques in image analysis
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
How can unlabeled data help?
COMP 875 Machine learning techniques in image analysis
Example: Text classification Source: J. Zhu
Classify astronomy vs. travel articles
Similarity measured by word overlap
COMP 875 Machine learning techniques in image analysis
Example: Text classification Source: J. Zhu
When labeled data alone fails:
What if there are no overlapping words?
COMP 875 Machine learning techniques in image analysis
Example: Text classification Source: J. Zhu
Unlabeled data as stepping stones:
Labels “propagate” via similar unlabeled articles
COMP 875 Machine learning techniques in image analysis
Another example Source: J. Zhu
Handwritten digits recognition with pixel-wise Euclidean distance
not similar indirectly similar with stepping stones
COMP 875 Machine learning techniques in image analysis
Types of semi-supervised learning
Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .
Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).
COMP 875 Machine learning techniques in image analysis
Types of semi-supervised learning
Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .
Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).
COMP 875 Machine learning techniques in image analysis
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U
3 Remove a subset from U and add that subset and its inferredlabels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Learn predictor f from labeled data L using supervisedlearning
2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred
labels to L
How might we select this subset?
Advantages/disadvantages of this scheme?
COMP 875 Machine learning techniques in image analysis
Self-training with nearest-neighbor classifier Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Find unlabeled point x that is closest to a labeled point x′
and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.
COMP 875 Machine learning techniques in image analysis
Self-training with nearest-neighbor classifier Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Find unlabeled point x that is closest to a labeled point x′
and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.
COMP 875 Machine learning techniques in image analysis
Propagating nearest-neighbor: Example Source: J. Zhu
(a) Iteration 1 (b) Iteration 25
(c) Iteration 74 (d) Final
COMP 875 Machine learning techniques in image analysis
Another example Source: J. Zhu
(a) (b)
(c) (d)
COMP 875 Machine learning techniques in image analysis
Another example Source: J. Zhu
(a) (b)
(c) (d)
COMP 875 Machine learning techniques in image analysis
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
Another simple approach: Cluster-and-label Source: J. Zhu
Input: labeled data L and unlabeled data URepeat:
1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that
cluster
3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster
What is the underlying assumption here?
COMP 875 Machine learning techniques in image analysis
Cluster-and-label: Examples Source: J. Zhu
Hierarchical clustering, majority vote predictor within cluster
COMP 875 Machine learning techniques in image analysis
Cluster-and-label: Examples Source: J. Zhu
Hierarchical clustering, majority vote predictor within cluster
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Labeled data (Xl, Yl):
Assuming each class has a Gaussian distribution, how do we findthe decision boundary?
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Labeled data (Xl, Yl):
The most likely model, and its decision boundary
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Labeled data (Xl, Yl) and unlabeled data Xu:
What is the most likely decision boundary now?
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Labeled data (Xl, Yl) and unlabeled data Xu:
What is the most likely decision boundary now?
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
The two boundaries are different because they maximize differentquantities:
p(Xl, Yl|θ) p(Xl, Yl, Xu|θ)
Gaussian mixture model: θ are the component weights, means, andcovariances
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ)
=∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ)
=∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ:
sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ)
= p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
)
∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ:
use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
Generative models Source: J. Zhu
Only labeled data:
p(Xl, Yl|θ) =∏
i
p(xi, yi|θ) =∏
i
p(yi|θ)p(xi|yi, θ)
ML estimate for θ: sample means, covariances, proportions foreach of the classes
Labeled and unlabeled data:
p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu
p(Xu, Yu|θ)
=
( ∏i labeled
p(yi|θ)p(xi|yi, θ)
) ∏j unlabeled
∑c
p(c|θ)p(xj |c, θ)
ML estimate for θ: use EM (Yu are hidden variables)
COMP 875 Machine learning techniques in image analysis
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
The EM algorithm for Gaussian mixtures Source: J. Zhu
1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):
pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c
Repeat:
2 The E-step: compute the expected label p(y|x, θ) for all x inXu.
3 The M-step: update MLE θ with the “softly labeled” Xu.
Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.
Can also be viewed as a special case of self-training.
COMP 875 Machine learning techniques in image analysis
Limitations of mixture models Source: J. Zhu
Assumption: mixture components correspond toclass-conditional distributions.
When the assumption is wrong:
COMP 875 Machine learning techniques in image analysis
Discriminative approach: Semi-supervised SVMs Source: J. Zhu
Idea: try to keep labeled points outside the margin, whilemaximizing the margin.
COMP 875 Machine learning techniques in image analysis
Discriminative approach: Semi-supervised SVMs Source: J. Zhu
Idea: try to keep labeled points outside the margin, whilemaximizing the margin.
COMP 875 Machine learning techniques in image analysis
Review: Standard SVMs
Classification function: f(x) = wTx + w0.
Standard SVM objective function:
minw,w0
‖w‖2 + λ1
∑i
(1− yif(xi))+
COMP 875 Machine learning techniques in image analysis
Semi-supervised SVMs Source: J. Zhu
Classification function: f(x) = wTx + w0.
To incorporate unlabeled points, assign to them putativelabels sgn(f(x)).
Semi-supervised SVM objective function:
minw,w0
‖w‖2+λ1
∑i labeled
(1−yif(xi))+ + λ2
∑j unlabeled
(1− |f(xj)|)+
COMP 875 Machine learning techniques in image analysis
Graph-based semi-supervised learning Source: J. Zhu
Idea: construct graph where nodes are labeled and unlabeledexamples, and edges are weighted by the similarity ofexamples.Unlabeled data can help “glue” the objects of the same classtogether.Assumption: items connected by “heavy” edges are likely tohave the same label.
COMP 875 Machine learning techniques in image analysis
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
Graph-based semi-supervised learning Source: J. Zhu
The mincut algorithm:
Assume binary classification (class labels are 0, 1).
Approach: fix Yl, find Yu to minimize∑i∼j
wij |yi − yj |.
Combinatorial problem, but has polynomial-time solution.
Harmonic functions:
Let’s relax discrete labels to continuous values in R.
We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
COMP 875 Machine learning techniques in image analysis
A random walk interpretation Source: J. Zhu
Randomly walk from node i to j with probabilitywij∑k wik
.
Stop if we hit a labeled node.
The harmonic function has the following interpretation:f(xi) = P (hit label 1|start from i).
COMP 875 Machine learning techniques in image analysis
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
The harmonic solution Source: J. Zhu
We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑
i∼j
wij(f(xi)− f(xj))2.
It can be shown that f(xi) =∑
j∼i wijf(xj)∑j∼i wij
at all unlabeled
points xi.
Iterative algorithm to compute harmonic function:
Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij.
COMP 875 Machine learning techniques in image analysis
The graph Laplacian Source: J. Zhu
Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =
∑j wij .
The graph Laplacian matrix is defined as L = D −W .
Then we can write∑i,j
wij(f(xi)− f(xj))2 = fTLf.
We want to minimize fTLf subject to constraints f(xi) = yi
on labeled data.
Solution: fu = −L−1uuLul yl, where yl are the labels for labeled
data, and
L =[Lll Llu
Lul Luu
].
COMP 875 Machine learning techniques in image analysis
The graph Laplacian Source: J. Zhu
Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =
∑j wij .
The graph Laplacian matrix is defined as L = D −W .
Then we can write∑i,j
wij(f(xi)− f(xj))2 = fTLf.
We want to minimize fTLf subject to constraints f(xi) = yi
on labeled data.
Solution: fu = −L−1uuLul yl, where yl are the labels for labeled
data, and
L =[Lll Llu
Lul Luu
].
COMP 875 Machine learning techniques in image analysis
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
The graph Laplacian Source: J. Zhu
Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:
minf
∑i labeled
c(f(xi)− yi)2 + fTLf.
Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as
minf
(f − y)TC(f − y) + fTLf
where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.
Then the solution is given by the linear system
(C + L)f = Cy.
COMP 875 Machine learning techniques in image analysis
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
Graph spectrum Source: J. Zhu
The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n
i=1 of the Laplacian L.
Properties of the graph spectrum:
A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.
L =∑n
i=1 λiφiφTi .
Any function f on the graph can be written as a linearcombination of eigenvectors: f =
∑ni=1 aiφi.
The “smoothness” of f can be written as fTLf =∑n
i=1 a2iλi.
COMP 875 Machine learning techniques in image analysis
Using the graph spectrum
Objective function
minf
∑i labeled
c(f(xi)− yi)2 + fTLf
= (f − y)TC(f − y) + fTLf.
We can restrict our solution to “smooth” functions f , i.e.,linear combinations of the first k eigenvectors associated withthe smallest eigenvalues: f =
∑ki=1 aiφi.
Now we can obtain f by solving a k × k linear system insteadof an n× n linear system.
COMP 875 Machine learning techniques in image analysis
References
J. Zhu, Semi-supervised learning survey, University of Wisconsin technicalreport, 2008.http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html
J. Zhu, Semi-supervised learning tutorial, Chicago Machine Learning SummerSchool, 2009.http://pages.cs.wisc.edu/~jerryzhu/pub/sslchicago09.pdf
COMP 875 Machine learning techniques in image analysis