interactive learning using manifold geometry eric eaton, gary holness, and daniel mcfarlane lockheed...

19
Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial Intelligence Research Group This work was supported by internal funding from Lockheed Martin and the National Science Foundation under NSF ITR #0325329.

Upload: jonathan-anderson

Post on 12-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

Interactive Learning using Manifold Geometry

Eric Eaton, Gary Holness, and Daniel McFarlane

Lockheed Martin Advanced Technology LaboratoriesArtificial Intelligence Research Group

This work was supported by internal funding from Lockheed Martin and the National Science Foundation under NSF ITR #0325329.

Page 2: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

2Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Introduction: Motivation

Information monitoring systems use a scoring function ff to focus user attention

– ff is customized to the current situation

– Often, no data are available to learn ff

Maritime Situational Awareness

Network Security Monitoring

– Users require fine control over the scoring function

We propose an interactive interactive learninglearning method that enables the user to iteratively refine ff

Page 3: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

3Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Introduction: Interactive Refinement

Uses a combination of manual input and machine learning:

1. The user manually selects and repositions a data point

2. The system relearns the model ff, and updates the scatterplot

Key idea: each adjustment should generalize naturally to the model

We use least squares with Laplacian regularization to learn ff, based on the manifold underlying the data

1D Projection of Data

Rel

evan

cy

User View Model View

Page 4: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

4Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Related Work: Interactive Learning

Crayons tool for interactive object classification (Fails & Olsen, 2003)

Interactive decision tree construction (Ware et al., 2001)

Interactive visual clustering (desJardins et al., 2008)

Feature selection(Dy & Brodley, 2000)

Hierarchical clustering (Wills, 1998)

Crayons by Fails & Olsen(Figure used with permission)

Interactive Visual Clustering by desJardins et al.(Figure used with permission)

Initial viewAfter 2

adjustmentsAfter 14

adjustments

Page 5: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

5Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Related Work: Interactive vs Active Learning

Active learning – selects instances for labeling by an oracle (Cohn et al., 1996; McCallum & Nigam, 1998; Tong, 2001)

Interactive ML Active Learning

Starts with… Unlabeled data IncorrectIncorrect model

Unlabeled data NoNo model

Selection of instances

UserUser determines adjustments

SystemSystem selects instances for labeling

GoalCollaborate with Collaborate with the userthe user to define or adjust a model

Minimize number of Minimize number of labelslabels needed to learn a model

Page 6: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

6Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Data setwhere

The user supplies the initialscoring function

– We used a linear function for

Current scoring function is givenby f (initially )

The user adjusts the score of individual data points to change f until it matches the true (hidden) function F

– Details of each instance are available in a side panel

– User selects and drags an instance up or down to change its score

Future work: similarity metric updates, qualitative feedback

Mechanisms for User Interaction

User View

1D Projection of Data

Rel

evan

cy

Score: 55 Id: dmaskes2Event: ACL-MonitorSystem: Julius-laptop-------------------------------Freq: 8 (1hr)

8 (24hr)-------------------------------DETAILS:UID: dmaskes2Role: App_UpdatePolicy: finCloseLockStartTime: 0 17 * * 5EndTime: 0 8 * * 1Res_type: triggerOverrideView_type: AcctClerkDS_name: tbl_wklyTotalsError: unauth_updateValue: (2 3 -2334 conf)

Page 7: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

7Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Approach: Learning the Scoring Function

Key Idea: each adjustment should generalize naturally to the model

– Adjustments should affect similar instances

– Generalizations should be based on the geometry underlying the data

Our approach:

– Construct the manifold underlying the data

– Learn/update f using the manifold’s basis

v5

v4

v 7

v11

v 13

v1

v2

v 3v10

v 12

v 6

v15v14

v8

Page 8: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

8Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Approach: Constructing the Manifold

Represent data set X as an undirected graph G = (V,A), with vertex vi representing instance xi

Adjacency matrix A is given by:

– Weighting each edge (vi, vj) by a radial basis function of the distance

– Connecting each instance to its k nearest neighbors

G is a discrete approximation of the continuous manifold

?

?

?

? ?

?

??

??

?

?

?

?

??

?

?

?

?

?

??

?

?

?

0.4

0.9

0.8

initial scoringfunction

Page 9: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

= Λλ1λ2λ3

λn

QTQ

9Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Approach: Learning the Function on the Manifold

Form the graph Laplacian of G (Chung, 1994)

Take the eigendecomposition of

Q = [q1 … qn] forms a complete orthonormal basis for G

where

q1 q2

q5 q10

q2

0

q50

Meshes provided by Gabriel Peyré

The first eigenvectoris constant

λ1 = 0

Page 10: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

10Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Approach: Learning the Function on the Manifold

The scoring function f : V → [0,1] is given by f = QW

Fit W by least squares with Laplacian regularization:

– This is a special case of Belkin et al.’s (2006) Manifold Regularization

– Eigenvalues ¤ increasingly regularize the higher-order components

A slider bar controls the weight of adjustments

Page 11: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

12Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Complete Algorithm for Interactive Refinement

Given: the data X, the user’s initial scoring function

Set

Construct the manifold underlying X, represented by G = (V,A)

Compute the graph Laplacian of G

Compute the eigenvectors Q and eigenvalues ¤ of

Repeat

– Display the scatterplot of X using the scores given by f

– (Optional) The user adjusts the score of data instance xi

– (Optional) The user updates the adjustment weight ! via a slider bar

– If there were changes, update the scoring function as f = QW, where W is given by

Page 12: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

13Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Scaling to Large Volumes of Data

A can be stored efficiently as a symmetric banded matrix

is also a symmetric banded matrix

– Use sparse eigensolvers (e.g., Lanczos methods) for efficiency

Nyström method (Baker 1977) extends the eigenvectors to new vertices for inductive learning

– Learn on a sample , with Laplacian

– Extend eigenvectors to new instances by

– Score for a new instance x (represented by vertex v) is then given by

Page 13: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

14Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Evaluation

Simulate user by adjusting the current “most incorrect” instance to the correct score

– Users are adept at identifying outliers, motivating our approach

– is a linear model fit to X using ridge regression

Compared against interactive learning using:

– SMO support vector regression with an RBF kernel

– Least squares regularized with a ridge parameter of 10E-8

Name #Inst #Dim SourceCPU 209 6 UCI repositoryHeart Disease 303 13 UCI repositoryPharynx 195 10 Kalbfleisch & Prentice (1980)

Pyrimidines 74 27 King et al. (1992)

Sleep 62 7 StatLib archiveWisconsin Breast Cancer 194 32 UCI repository

Data Sets

Page 14: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

15Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Evaluation: Adjusting the “most incorrect” instance

Page 15: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

16Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Evaluation: Adjusting a random instance (100 trials)

Page 16: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

17Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Related Work: Manifold Learning

Belkin et al.’s (2006) Manifold Regularization

– We use a special case regularizing only the solution’s smoothness

Semi-supervised learning using Gaussian random fields (Zhu et al., 2003; Cai et al., 2006)

Zhou et al.’s (2004; 2005) “Distribution Regularization”

– Uses a regularized form of the graph Laplacian as the basis

– Learns a function

Spectral Graph Transduction (Joachims, 2003)

Page 17: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

18Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

Conclusion and Future Work

We presented a method for interactive learning based on least squares with Laplacian regularization

Manifold-based interactive learning continuously improves with each correction

In practice, the technique shows an interactive response time for hundreds of data instances

Future Work:

– User adjustment of the similarity metricbetween data instances

– Incorporate passive observation of the user

– Handling drifting or recurring concepts

Page 18: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

Thank You!Questions?

Eric [email protected]

This work was supported by internal funding from Lockheed Martin and the National Science Foundation under NSF ITR #0325329.

Page 19: Interactive Learning using Manifold Geometry Eric Eaton, Gary Holness, and Daniel McFarlane Lockheed Martin Advanced Technology Laboratories Artificial

20Eric Eaton, Gary Holness, & Daniel McFarlane - Interactive Learning using Manifold Geometry

References

Baker, C. T. H. 1977. The Numerical Treatment of Integral Equations. Oxford: Clarendon Press.

Belkin, M.; Niyogi, P.; and Sindhwani, V. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Artificial Intelligence Research 7:2399-2434.

Cai, D., He, X., and Han, J. 2007. Spectral regression: a unified subspace learning framework for content-based image retrieval. In Proceedings of the 15th International Conference on Multimedia, p. 403-412. ACM Press.

Chung, F. R. K. 1994. Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Mathematics. Providence, RI: American Mathematical Society.

Cohn, D. A.; Ghahramani, Z.; and Jordan, M. I. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4:129-145.

desJardins, M.; MacGlashan, J.; and Ferraioli, J. 2008. Interactive visual clustering for relational data. In Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall. 329-356.

Dy, J. G., and Brodley, C. E. 2000. Visualization and interactive feature selection for unsupervised data. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 360-364. ACM Press.

Fails, J. A., and Olsen, Jr., D. R. 2003. Interactive machine learning. In Proceedings of the Eighth International Conference on Intelligent User Interfaces, 39-45. Miami, FL: ACM Press.

Joachims, T.: 2003. Transductive Learning via Spectral Graph Partitioning. In Proceedings of the International Conference on Machine Learning, p. 290-297.

McCallum, A., and Nigam, K. 1998. Employing EM in pool-based active learning for text classification. In Proceedings of Fifteenth International Conference on Machine Learning, 359-367. San Francisco, CA: Morgan Kaufmann.

Tong, S. 2001. Active Learning: Theory and Applications. Ph.D. Dissertation, Stanford University.

Ware, M.; Frank, E.; Holmes, G.; Hall, M.; and Witten, I. H. 2001. Interactive machine learning: Letting users build classifiers. International Journal of Human Computer Studies 55(3):281-292.

Wills, G. J. 1998. An interactive view for hierarchical clustering. In Proceedings of the 1998 IEEE Symposium on Information Visualization (INFOVIS), Washington, DC, USA: IEEE Computer Society.

Zhou, D.; Huang, J.; and Scholkopf, B. 2005. Learning from labeled and unlabeled data on a directed graph. In Proceedings of the International Conference on Machine Learning, p. 1036-1043. Bonn, Germany: ACM Press.