active semi-supervised learning using submodular functions andrew guillory, jeff bilmes university...
Post on 19-Dec-2015
216 views
TRANSCRIPT
![Page 1: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/1.jpg)
Active Semi-Supervised Learningusing Submodular Functions
Andrew Guillory, Jeff BilmesUniversity of Washington
![Page 2: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/2.jpg)
Given unlabeled data
for example, a graph
![Page 3: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/3.jpg)
Learner chooses a labeled set
![Page 4: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/4.jpg)
Nature reveals labels
+-
![Page 5: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/5.jpg)
Learner predicts labels
++
+
-- -+
-
-
++
![Page 6: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/6.jpg)
Learner suffers loss
+ ++
-- -+
--
++
+ ++
-- -+
-+
+-Predicted Actual
![Page 7: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/7.jpg)
Basic Questions
• What should we assume about ?• How should we predict using ?• How should select ?• How can we bound error?
![Page 8: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/8.jpg)
Outline
• Previous work: learning on graphs• More general setting using submodular functions• Experiments
![Page 9: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/9.jpg)
Learning on graphs
• What should we assume about ?• Standard assumption: small cut value
• A “smoothness” assumption
Φ (𝑦 )=2+ ++
-- -+
--
++
![Page 10: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/10.jpg)
Prediction on graphs
• How should we predict using ?• Standard approach: min-cut (Blum & Chawla 2001)• Choose to minimize s.t. • Reduces to a standard min-cut computation
+- + +
+-
- -+-
-++
![Page 11: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/11.jpg)
Active learning on graphs• How should select ?• In previous work, we propose the following objective
where is cut value between and • Small means an adversary can cut away many points
from without cutting many edges
Ψ(L) = 1/8 Ψ(L) = 1
![Page 12: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/12.jpg)
Error bound for graphs
Theorem (Guillory & Bilmes 2009): Assume minimizes subject to . Then
How can we bound error?
• Intuition: • Note: Deterministic, holds for adversarial labels
![Page 13: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/13.jpg)
Drawbacks to previous work
• Restricted to graph based, min-cut learning• Not clear how to efficiently maximize – Can compute in polynomial time (Guillory & Bilmes 2009)– Only heuristic methods known for maximizing– Cesa-Bianchi et al 2010 give an approximation for trees
• Not clear if this bound is the right bound
![Page 14: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/14.jpg)
Our Contributions
• A new, more general bound on error parameterized by an arbitrarily chosen submodular function
• An active, semi-supervised learning method for approximately minimizing this bound
• Proof that minimizing this bound exactly is NP-hard• Theoretical evidence this is the “right” bound
![Page 15: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/15.jpg)
Outline
• Previous work: learning on graphs• More general setting using submodular functions• Experiments
![Page 16: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/16.jpg)
Submodular functions• A function defined over a ground set is submodular
iff for all
• Example:
• Real World Examples: Influence in a social network (Kempe et al. 03), sensor coverage (Krause, Guestrin 09), document summarization (Lin, Bilmes 11)
• is symmetric if
![Page 17: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/17.jpg)
Submodular functions for learning
• (cut value) is symmetric and submodular• This makes “nice” for learning on graphs
– Easy to analyze– Can minimize exactly in polynomial time
• For other learning settings, other symmetric submodular functions make sense– Hypergraph cut is symmetric, submodular– Mutual information is symmetric, submodular– An arbitrary submodular function can be symmetrized
![Page 18: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/18.jpg)
Generalized error bound
• and are defined in terms of , not graph cut
• Each choice of gives a different error bound• Minimizing s.t. can be done in polynomial time
(submodular function minimization)
Theorem: For any symmetric, submodular , assume minimizes subject to . Then
![Page 19: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/19.jpg)
Can we efficiently maximize ?
• Two related problems:1. Maximize subject to 2. Minimize subject to
• If were submodular, we could use well known results for greedy algorithm:– approximation to (1) (Nemhauser et al. 1978) – approximation for (2) (Wolsey 1981)*
• Unfortunately is not submodular
*Assuming integer valued
![Page 20: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/20.jpg)
Approximation result
• Define a surrogate objective s.t.– is submodular– iff
• In particular we use
• Can then use standard methods for
Theorem: For any integer, symmetric, submodular , integer , greedily maximizing gives with and
![Page 21: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/21.jpg)
Can we do better?
• Is it possible to maximize exactly?Probably not, we show the problem is NP-Complete– Holds also if we assume is the cut function– Reduction from vertex cover on fixed degree graphs– Corollary: no PTAS for min-cost version
• Is there a strictly better bound?Not of the same form, up to the factor 2 in the bound.– Holds without factor of 2 for slightly different version– No function larger than for which the bound holds– Suggests this is the “right” bound
![Page 22: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/22.jpg)
Outline
• Previous work: learning on graphs• More general setting using submodular functions• Experiments
![Page 23: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/23.jpg)
Experiments: Learning on graphs
• With set to cut, we compared our method to random selection and the METIS heuristic
• We tried min-cut and label propagation prediction• We used benchmark data sets from Semi-Supervised
Learning, Chapelle et al. 2006 (using knn neighbors graphs) and two citation graph data sets
![Page 24: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/24.jpg)
• Our method + label prop best in 6/12 cases, but not a consistent, significant trend
• Seems cut may not be suited for knn graphs
Benchmark Data Sets
![Page 25: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/25.jpg)
• Our method gives consistent, significant benefit• On these data sets the graph is not constructed by us
(not knn), so we expect more irregular structure.
Citation Graph Data Sets
![Page 26: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/26.jpg)
Experiments: Movie Recommendation
• Which movies should a user rate to get accurate recommendations from collaborative filtering?
• We pose this problem as active learning over a hypergraph encoding user preferences, using set to hypergraph cut
• Two hypergraph edges for each user:– Hypergraph edge connecting all movies a user likes– Hypergraph edge connecting all movies a user dislikes
• Partitions with low hypergraph cut value are consistent (on average) with user preferences
![Page 27: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/27.jpg)
Movies Maximizing Ψ(S)
American BeautyStar Wars Ep. IV
Jurassic ParkFargo
Star Wars Ep. I Forrest Gump
Wild Wild West (1999) The Blair Witch Project
Titanic Mission: Impossible 2
Babe The Rocky Horror Picture Show
L.A. Confidential Mission to Mars Austin Powers
Son in Law
Star Wars Ep. VStar Wars Ep. VI
Saving Private RyanTerminator 2: Judgment Day
The MatrixBack to the Future
The Silence of the LambsMen in Black
Raiders of the Lost ArkThe Sixth Sense
BraveheartShakespeare in Love
Movies Rated Most Times
Using Movielens data
![Page 28: Active Semi-Supervised Learning using Submodular Functions Andrew Guillory, Jeff Bilmes University of Washington](https://reader035.vdocuments.mx/reader035/viewer/2022062714/56649d2f5503460f94a07765/html5/thumbnails/28.jpg)
Our Contributions
• A new, more general bound on error parameterized by an arbitrarily chosen submodular function
• An active, semi-supervised learning method for approximately minimizing this bound
• Proof that minimizing this bound exactly is NP-hard• Theoretical evidence this is the “right” bound• Experimental results